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Some existing notions of redundancy among association rules allow for a logical-style characterization and 
lead to irredundant bases of absolutely minimum size. One can push the intuition of redundancy further 
and find an intuitive notion of interest of an association rule, in terms of its "novelty" with respect to other 
rules. Namely: an irredundant rule is so because its confidence is higher than what the rest of the rules 
would suggest; then, one can ask: how much higher? 
I We propose to measure such a sort of "novelty" through the confidence boost of a rule, which encompasses 

I two previous similar notions (confidence width and rule blocking, of which the latter is closely related to the 

earlier measure "improvement"). Acting as a complement to confidence and support, the confidence boost 
helps to obtain small and crisp sets of mined association rules, and solves the well-known problem that, in 
certain cases, rules of negative correlation may pass the confidence bound. We analyze the properties of two 
versions of the notion of confidence boost, one of them a natural generalization of the other. We develop 
efficient algorithmics to filter rules according to their confidence boost, compare the concept to some similar 
notions in the bibliography, and describe the results of some experimentation employing the new notions 
on standard benchmark datasets. We describe an open-source association mining tool that embodies one of 
our variants of confidence boost in such a way that the data mining process does not require the user to 
select any value for any parameter. 
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1. INTRODUCTION 

As the now well-known task of association rule mining was denned, the problems faced 
qq were twofold. First, the quantity of candidate itemsets for antecedent X and consequent Y 

of association rules X — > Y grows exponentially with the often already large universe of 
J""-, items. The introduction of a support threshold parameter was a key advance that allowed 

for the design of efficient frequent set miners and for the computation of association rules 
in large datasets: there, exploration is limited to those itemsets that appear "often enough" 
as subsets of the transactions, that is, their relative frequency exceeds a certain ratio of the 
i— H transactions; see [Agrawal et al. 1996] and the references there. Then, the second problem is 

i— I that, often, the set of rules provided as output is too large, specially if we consider that its 

purpose is to be read, and understood, by a human. We consider that this problem warrants 
. further research, and we attempt at providing here yet one more approach to it. 
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These two difficulties are of very different sorts. The exponential growth of candidates is 
essentially a combinatorial, almost technological problem, and all the existing solutions are 
based on the acceptance that, as not all the billions of candidates can be considered within 
reasonable running times, we make do with those that obey the support constraint. However, 
this solution puts unto the shoulders of the user the heavy responsibility of choosing the 
support threshold, with little or no guidance about how to do it. 

On the other hand, it is no problem for our current computing equipments to extract 
association rules from frequent sets. The proposal in [Agrawal et al. 1996] (and already in 
the early [Luxcnburger 1991] where, however, the support bound proposal does not appear) 
is to impose upon association rules I-fYa confidence constraint, that is, a threshold on 
the conditional probability of Y conditioned to X. 

Indeed, association rule mining, in essence, amounts to enumerating all the rules that 
arc not disproved by the data. As there are exponentially growing quantities of potential 
associations, even relatively large datasets are unable to disprove most of them. Therefore, 
in the standard "support and confidence" framework, it is well-known, and easy to check 
using any of the public datasets and free association miners available on the web, that 
whereas high, demanding thresholds for these parameters generally yield few somewhat 
obvious rules, softening them, as much as the algorithmics (and the user patience) would 
allow, leads to large amounts of rules, with many of them looking very much like each 
other; often, they are not a user-friendly enough result of a data mining process, due to the 
presence of these intuitive redundancies. 

As a preliminary filter, there are several essentially logical definitions of redundancy, 
patterned after similar intuitions in Propositional or First-Order Logic. This leads to 
minimum-size bases, such as the Representative (or Essential) Rules [Aggarwal and Yu 2001; 
Kryszkiewicz 1998b] for plain redundancy or the basis B* [Balcazar 2010c] for closure-based 
redundancy, at confidence threshold 7, that spares the computation of minimal generators 
needed by the Representative Rules, but needs to be complemented with a basis for full im- 
plications. All these questions are thoroughly surveyed in [Balcazar 2010c]. But even taking 
redundancies into account, the results are, in many cases, unsatisfactory; therefore, many 
alternative quality measures exist for association rules, essentially due to the facts that, 
first, the confidence of a rule X — » Y can be high even in cases where the actual correla- 
tion between X and Y is negative, and, second, it is often extremely difficult to settle for 
thresholds where interesting rules are kept but the total amount of rules can be handled; 
see [Geng and Hamilton 2006; Lenca et al. 2008; Tan et al. 2004] and their references for 
information about the rich research area opened up by these difficulties. We note that, from 
the point of view of the user, the usage of alternative implicational measures leads to an 
even worse situation, as (s)he has to choose again both the measures to apply and their 
corresponding thresholds. The literature on this topic is huge and cannot be reviewed here; 
a discussion of the relationships of our contributions with the most relevant ones among the 
published proposals is deferred to Subsections 7.1 and 7.2. 

Our development is based on the simple consideration that rules can be evaluated for 
"novelty" , by comparison with the rest of the rules mined. Actually, the outcome of every 
Data Mining project is expected to offer some degree of novelty. If one ends up identifying 
only facts whose validity is obvious, these would not be really useful. However, to formally 
study the novelty of Data Mining results is far from being a trivial task. Indeed, novelty 
is, in an intuitive sense, a relative notion: it refers to facts that are, somehow, unexpected; 
hence, some "low expectation" reason must exist, and must be due to alternative facts 
or prediction mechanisms. That is: a piece of information is novel or is not, always with 
respect to a given context of previously known facts; definitions of novelty must take into 
account, then, some form of previously available knowledge, a notion hard to formalize 
(Subsection 7.1 describes some approaches, but see e.g. [Padmanabhan and Tuzhilin 2000] 
and the references therein). 
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However, as one very partial and probably insufficient, but necessary action, we claim 
that, as a minimum, each rule should be evaluated for novelty by comparison with the rest 
of the rules mined, treated as "alternative" mechanism [Balcazar 2009]. One can attempt 
at measuring to what extent the confidence of the rule is substantially higher than that 
of related rules that would, intuitively, explain the same facts. In the same reference, the 
confidence width is proposed as a measure of a relative form of objective novelty or surpris- 
ingness of each individual rule with respect to other rules that hold in the same dataset. 
As some intuitive redundancies are not covered by that measure, the same paper proposes 
also to allow some rules to block other rules in case the blocked rule does not bring in 
enough novelty with respect to the blocker. (We give below the precise definitions of these 
notions.) Essentially, these proposals measure novelty through the extent to which the con- 
fidence value is "robust" , taken relative to the confidences of related rules, as opposed to 
the absolute consideration of the single rule at hand. 

To give a hint of the sort of process we are discussing, assume a rule, of confidence say 
75%, is found in a census- like dataset, stating that young people earn lesser salaries; in the 
presence of such a rule, a more complex one stating that young, unmarried people earn lesser 
salaries could be novel, but only if its confidence turns out to be substantially higher than 
75%, maybe 90%. Otherwise, it would not be novel, the simpler rule should be preferred, 
and even the complex rule discarded (or blocked), all depending on thresholds on confidence 
and on some other parameter such as improvement [Bayardo et al. 1999], blocking factor, 
or confidence boost (to be introduced here). Further discussion will be provided along the 
body of the paper. 

It was empirically demonstrated in [Balcazar 2009] that better results were obtained using 
both a confidence width threshold and a blocking threshold, than using a single one of these 
filters (or none). However, no really fast way of testing a rule for blockings was provided. 
Thus, our contribution here is a new attempt at formalizing the notion of novelty, the 
confidence boost, similar in its syntactic definition to confidence width, but different in its 
semantics, which is more restrictive; its main feature is that it encompasses at once both the 
bound on the confidence width and the ability to detect that a rule would be blocked, so that 
the confidence boost bound embodies both of the bounds proposed in [Balcazar 2009], yet 
it is computable with reasonable efficiency. Confidence boost comes in two flavors: a "plain" 
one, that we develop in Section 3, and a more general variant that takes into account the 
closure space implicit in the data, developed in Section 4. 

Three short extended abstracts of three, six, and seven pages respectively have announced 
results from this paper in scientific meetings; reference [Balcazar 2010b] contains the defi- 
nition of confidence boost, fragments of Section 2 (where we also review a small number of 
necessary facts from [Balcazar 2010c]), part of Section 3 (the definition of confidence boost), 
and the algorithm in Subsection 3.2 (but not its correctness proof). Reference [Balcazar 
2010a] contains the definition of closure-based confidence boost and part of the materials 
in Section 4, again including the main algorithm but not its correctness proof, as well as 
materials from Subsection 5.2. The tool yacaree which embodies closure-based confidence 
boost into a parameter-free association miner (Section 6) was advertised at [Balcazar 2011] 
(demo track). The rest of Sections 3, 4, and 5, as well as the discussions in Section 7, are 
unpublished. 

2. PRELIMINARIES 

A given set of available items U is assumed; its subsets are called itemsets. We will denote 
itemsets by capital letters from the end of the alphabet, and use juxtaposition to denote 
union, as in XY. The inclusion sign as in X C Y denotes proper subset, whereas improper 
inclusion is denoted X C Y . For a given dataset T>, consisting of n transactions, each of 
which is an itemset labeled with a unique transaction identifier, we can count the support 
Sd(X) of an itemset X, which is the cardinality of the set of transactions that contain X. An 
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alternative rendering of support is its normalized version, the relative frequency or empirical 
probability s-p(X)/n; we will work with the unnormalized quantity. 

Association miners explore datasets in search of valid expressions of the form X — > Y, 
where X and Y stand for itemsets. Intuitively, an association rule X — > Y means that, in the 
given dataset, the transactions that contain X "tend to contain" Y as well. The confidence 
of a rule X — > Y is Ct>(X — > Y) = s-u(XY) / sd{X), akin to an empirical approximation to a 
conditional probability. It is important to observe that the precise definition of association 
rules depends on the formalization chosen for the informal expression "tend to", as only 
then these syntactical expressions become endowed with a concrete semantics and associated 
specific properties. For instance, if we define the meaning of X — > Y through confidence, 
then rules X — > Y and X — > XY are equivalent, whereas, if we use lift (defined below), 
then they may not be equivalent. 

Confidence is a very natural notion to prune and rank the output of an association rule 
mining algorithm, but we must point out that, due to some objections that we review in 
Subsection 2.2, there exist other proposals of notions to replace confidence. When confidence 
is 1, the maximum value, we say that X — > Y is an implication: every transaction containing 
X contains as well Y. Sometimes we use the term partial rule for an association rule of 
confidence less than 1. The support of a rule X — > Y is sj>(X — > Y) = s-p(XY). When 
the dataset is clear from the context, we will omit the subscript T> from both support and 
confidence. We do allow X = as antecedent of association rules: then the confidence 
coincides with the normalized support, c(0 — > Y) = s{Y)/s{$) — s(Y)/n. Allowing Y = 
as consequent as well is possible but not very useful, as this case leads only to trivial rules 
equivalent to rcflcxivity statements; therefore we assume that such rules are omitted from 
all our sets of rules. In the proposal of [Agrawal et al. 1996], association rules are restricted 
to \Y\ = 1. This allows for faster algorithmics, as rules are directly obtained from each 
frequent set. In fact, whereas confidence 1 implications, say, A — > B and A — > C jointly 
are indeed equivalent to A — > BC, for confidence less than 1 they are not. A — > BC says 
that B and C appear jointly often with A, whereas associations A — > B and A — > C , even 
together, provide less information, as B and C could appear often with A but not so much 
together (we offer an example below). Thus, we do not force \Y\ = 1. 

In many cases we assume that the context provides for a threshold on the confidence, 
imposing a constraint c(X — > Y) > 7 on rules, and likewise a support threshold constraint 
s(X — > Y) > t. It is formally convenient to use strict inequality in the latter case, to easily 
cater for the case where no support bound is imposed, by simply taking t = 0; whereas, 
for confidence, we prefer to be able to select full-confidence implications via the nonstrict 
inequality with 7 = 1. 



Remark 2.1. As we consider mainly confidence and support, rules X — > Y and X — »■ XY 
are equivalent in almost all our statements, as are all rules where some part of the left-hand 
side X is repeated in the right-hand side. Our novelty notions will respect as well this 
equivalence. The only exceptions will be in our brief considerations of lift. Two natural 
canonical choices to simplify the discussion are to restrict the discussion either to the rules 
of the form X — > Y or to those of the form X — > XY, where, in both cases, X n Y = 0. 
We will see in Subsection 7.2 that failing to clarify this option risks overlooking subtle 
differences among sets of rules enjoying, however, quite different properties. Based on the 
similar developments in implications and functional dependencies, we choose the latter: we 
will make explicit always what part of the consequent is already in the antecedent and write 
all our association rules as X — > XY where X n Y = 0. However, this choice is somewhat 
arbitrary, and whomever prefers association rules with disjoint sides only needs to remove 
the copy of the antecedent from the consequent. In fact, in our implementations, at the time 
of showing a rule to the user, of course only the Y part of the consequent is shown. 
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Fig. 1. An example closure space, with 
three minimal generators; the dataset 
contains the following transactions: 
ABODE (x6), ABC(x2), AB(x2), 
CDE(xl), BC(xl) 



Given a dataset V, an itemset X C U is closed if the support of any strictly larger 
itemset is strictly smaller; and is free, or a minimal generator, if the support of any strictly 
smaller itemset is strictly larger. We denote as X the closure of itemset X with respect to a 
given dataset: X is the smallest closed itemset that includes X or, cquivalently, the largest 
itemset that includes X and has the same support as X in the dataset. It is easy to check 
that it is unique. The intersection of closed itemsets is closed and, ordered by inclusion, 
the closed itemsets form a lattice which we call "closure space". We will make liberal use 
of the three characteristic properties of closure operators, namely, extensivity: X C X: 

monotonicity: X C Y implies X C Y; and idempotency: X = X. We will mention below 
further details about the connections of closure operators and free sets with association 
mining; see e. g. [Boulicaut et al. 2003; Zaki 2004] for further information. 

Example 2.2. We will employ as running example through most of this paper the closure 
space obtained from a specific dataset. For this example, the universe U includes the five 
items A, B, C, D, and E. The dataset consists of 12 transactions, six of which include 
all of U; two more consist of ABC, again two transactions consist of AB, and then one 
transaction consists of CDE and another one consists of BC. It is easy to see that the 
associated closure-space lattice is as depicted in Figure 1, where transitive arcs have been 
omitted and, besides the closed sets, three minimal generators (connected to their closures) 
have been indicated in broken lines. The supports of all closed sets are reported in the figure 
for convenience. The support of each minimal generator coincides with that of its closure. 
Note that sometimes the minimal generator coincides with its closure, as in set BC, for 
one. This example illustrates that, at confidence 9/11, both the association rules B — > A 
and B — > C hold, whereas the stronger rule B — > AC does not, as its confidence is only 
8/11. That is, if and when B — > AC holds, it would give more information than B — > A 
and B — > C holding jointly. 

We will propose to measure the novelty of each rule with respect to the rest of the outcome 
of the same data mining process, through a variant of the intuitive idea of redundancy. 
Several notions of redundancy for association rules exist. In the early proposal [Luxenburger 
1991], a rule is redundant if its confidence can be computed from that of other rules. 
Later, this idea has been refined, making precise what information is maintained and which 
operations are allowed to infer confidence or support of redundant rules: see the survey of 
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several concise representations and redundancy notions in [Kryszkiewicz 2002] . In [Pasquicr 
et al. 2005] (and in earlier conference versions of their work) the following set of rules is 
shown to be sufficient to compute the confidence and support of any given partial rule: 

Definition 2.3. Given a dataset and a support threshold r acting on all sets and rules: 

(1) The min-max rules are those of the form X — > XY where XY is a closed set and X is 
a minimal generator; they are split into the following two cases. 

(2) The min-max approximate rules are those of the form X — > XY where XY is a closed 
set, X is a minimal generator, and X C XY. They have confidence less than 1. 

(3) The min-max exact rules are those of the form X — »■ XY where XY is a closed set, X 
is a minimal generator, and X = XY. They have confidence 1. 

Similar notions of redundancy are studied in [Zaki 2004], where, however, the approximate 
bases are constructed as rules having minimal generators both at the left- and at the right- 
hand sides. These bases are quite more succinct than the sets of all association rules that 
hold in a specific dataset, yet they still conform far too large quantities in many cases of 
interest. Therefore, less demanding notions of redundancy for association rules have been 
studied. If we assume that the set of frequent closures is kept, so that confidences are easily 
computed from them, and focus on the "user-centric" view, there is a very precise and 
natural notion that allows us to identify irredundant bases of absolutely minimum size. For 
the whole paper, we concentrate basically on this redundancy notion, and on a somewhat 
more sophisticate variant that we will describe in Section 4. 

Lemma 2.4. Consider two association rules, X — > X Y n and X\ — > X{Y\. The follow- 
ing are equivalent: 

(1) The confidence and support of X — > X Y n are always larger than or equal to those 
of X\ — > X{Y\, in all datasets; that is, for every dataset T> , we will have c-t)(Xq — > 
X Y Q ) > cd(Xx -> X x Yi) and s v (X Y ) > s v (X 1 Y l ) in it. 

(2) X 1 CX CX Y CX 1 Y 1 . 

When these cases hold, we say that X\ — > X{Y\ makes Xq — > XqYq redundant, or also 
that X\ — >■ X\Y\ is logically stronger than Xq — >■ XqYq. The notions come, essentially, from 
[Aggarwal and Yu 2001; Kryszkiewicz 1998b]. For a fixed confidence threshold, those rules 
that reach it, and are not made redundant by other rules also above the threshold, form 
the representative (or essential) rule basis for that confidence threshold [Aggarwal and Yu 
2001; Kryszkiewicz 1998b; Phan-Luong 2001]; that is, every rule that reaches the confidence 
threshold is either in the corresponding representative basis, or made redundant by a rule 
in the basis. Hence, a redundant rule is so because we can know beforehand, from the 
information in a basis, that its confidence will be above the threshold. These references also 
explain how to compute the representative basis out of the closed itemsets for the dataset. 

The fact that statement (2) implies statement (1) in Lemma 2.4 is easy to see and was 
already pointed out in [Aggarwal and Yu 2001; Kryszkiewicz 1998b; Phan-Luong 2001] (in 
somewhat different terms). The converse implication is nontrivial and much more recently 
shown [Balcazar 2010c]; see this reference as well for the proof that the representative 
basis has the minimum possible size among all bases for this notion of redundancy, and 
for discussions of other related redundancy notions. In particular, several other natural 
proposals are shown there to be equivalent to this redundancy. Also, from this same source, 
we will consider later on a variant which makes a deeper use of the closure operator., 

A known property that relates representative rules to closure-based miners is: 

Proposition 2.5. On a given dataset and in the presence of a fixed support threshold t, 
consider the association rule X — > XY , and set 7 = c(X — »■ XY). The following are 
equivalent: 
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(1) X — > XY is a representative rule for some confidence threshold. 

(2) X — > XY is a min-max rule: XY is a closed set and X is a minimal generator. 

(3) X — > XY is a representative rule for confidence threshold 7. 

Hence, whenever we refer to X — > XY as a representative rule, without mention of 
the specific confidence threshold 7 for which it is so, we implicitly understand that we 
mean 7 = c(X — > XY). The implication from (1) to (2) is from [Kryszkiewicz 1998a] (see 
also [Kryszkiewicz 2001] for a clearer notation): if X — > XY is a representative rule then 
s(X) < s(X') for all X' C X, and s(Z) < s(XY) for all Z with IF C Z; that is, X is a 
minimal generator and XY is closed. 

We have not found the other implications explicitly stated, but they appear implicitly, 
in a sense, in the references that discuss these notions. We sketch here the rather simple 
proofs for completeness. Set 7 = c(X — > XY). We assume that X — > 17 is a min-max 
rule, and consider a different rule, X' — > X'Y', logically stronger than X — > IF; we 
must argue that it fails the confidence threshold. By Lemma 2.4, we have X' <Z X and 
XY C X'Y'. If the left-hand sides differ, X' C X and, X being a minimal generator, 
s(X') > s(X); then c(X' -> A'7') < c(X' -> 17) < c(X -> 17) = 7. If, instead, 
X 1 = X, then XY C X'7' and, 17 being closed, s(X'7') < s(XY); we obtain that 
c(X' -> X'7') = c(X -> A'7') < c(X -> X7) = 7 again. The remaining implication, 
(3) to is obvious. 

Example 2.6. One can check that the dataset and the closure space of Example 2.2 
lead to seven representative rules at confidence threshold 0.8, namely, A BC , C — > AB, 
B -> C, -> C, -> AB, and D -> ABCE, and £ -> ABCD. The first two have confidence 
exactly 0.8, the others have confidences slightly higher. 

For fixed confidence thresholds, the representative rules at that confidence form often a 
properly smaller basis than the min-max rules; this can be achieved because of two reasons. 
One is that, obviously, min-max rules of confidence below the threshold are omitted. But a 
more sophisticate reason is that a representative rule at a given confidence 7 may cease to 
be so at lower confidences: at a lower threshold 7' it is possible that a stronger rule appears 
that makes it redundant. This observation is key in the notion of confidence width that we 
review next. 

2.1. Confidence Width 

Along most of our discussions in this paper, we assume that a dataset V and a support 
threshold r have been fixed: all our rules are assumed to reach strictly above that support 
threshold on T>. 

According to the definition of redundancy in Lemma 2.4, all rules in the representative 
basis provide some irredundant information. However, it is often the case that still the 
representative basis contains more rules than reasonable for human inspection. In [Balcazar 
2009] , the intuition of redundancy is pushed further in order to gain a perspective of novelty 
of association rules. An irredundant rule of a given confidence c belongs to the basis for 
that confidence threshold 7 = c: no rule of that confidence or higher makes it redundant; 
equivalently, all rules that make it redundant have lower confidence. Then, one can ask: 
"how much lower?". This can be evaluated by means of the following definition from the 
same reference: 

Definition 2.7. For an association rule X — > XY, consider all rules that are not equiv- 
alent to X — > XY (as per Remark 2.1), but such that X — > XY is redundant with respect 
to them, and pick one with maximum confidence in V among them, say X' — > X'Y'. The 
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confidence width of X — > IF in I? is: 

c(X -> XY) 



w(X -> XF) = 



c(A' -> X'F') 



The condition that X -4- is redundant with respect to X' — > X'K' implies that 
c(X' — > X'Y') < c(X — > XY), hence the confidence width is always 1 or larger. In fact, 
— > XY) is strictly higher than 1 if and only if X — > XY is a representative rule. 

To explain better the intuition behind the notion of confidence width, consider a rule 
X — > XY of a given confidence, say c(X — > XY) = c € [0, 1], and let us see what happens 
as we mine the representative basis at a varying confidence threshold 7. If cq < 7, the 
rule at hand will not play any role at all, being of confidence too low for the threshold. At 
7 = c , the rule becomes part of the output of any standard association mining process, 
but it could be that some other "logically stronger" rule appears at the same confidence cq. 
For instance, it could be that both rules A — > AB and A — > ABC have confidence Co: then 
A — > AB is redundant and will not belong to the basis for that confidence. In this case, the 
confidence width is 1, its smallest possible value. 

If no stronger rule appears at threshold 7 = Co, then X — > XY will belong to the 
representative basis for that threshold. Let us keep decreasing the threshold. At some lower 
confidence, a logically stronger rule may appear. If a logically stronger rule shows up early, 
at a confidence threshold 7 very close to Co, then the rule X — > XY is not very novel: it 
is too similar to the logically stronger one, and this shows in the fact that the interval of 
confidence thresholds where it is a representative rule is narrow. Its confidence width will 
be barely above 1. To the contrary, a stronger rule may take long to appear: in that case, 
only rules of much lower confidence entail X — > XY, so that the fact that it does reach 
confidence cq is novel in this sense. The interval of confidence thresholds where X — > XY 
is a representative rule is wide, as will be the value of the confidence width. For instance, 
if the confidence of A — > AB is 0.9, and all rules that make it redundant have confidences 
below 0.75, the rule is a much better candidate to novelty than it would be if some rule 
like A — >■ ABC would have a confidence of 0.88: in this last case, A — > AB indeed brings 
in additional information, but its novelty, with respect to the other rules, is not high; it 
only belongs to the basis when the confidence threshold is in the interval (0.88,0.9]. In the 
other case where all rules that could make it redundant have confidences, say, 0.75 or less, 
then A — > AB would belong to the basis for a considerably wider interval of confidences, 
(0.75,0.9]. It states something really different from the rest of the information mined. As 
an objective novelty measure, thus, confidence width measures the width of the interval of 
confidences in which the rule at hand belongs to the representative basis. 

It is proved in [Balcazar 2009] that, in Definition 2.7, it suffices to consider representative 
rules for the role of X' X'Y 1 . 

Example 2.8. For association rule A — > BC, of confidence 0.8, in Example 2.2, the 
confidence width is 1.2. The confidence of that rule is at least 20% higher than that 
of any rule that entails it. Indeed, there are two representative rules logically stronger, 
namely A — > BCDE (of confidence 0.6) and — > ABC (of higher confidence, 2/3); hence, 
w(A -> BC) = (8/10)/(2/3) = 1.2. 

Below we will need Definition 2.7 in a single formula; for this, we can replace the redun- 
dancy condition with its characterization according to Lemma 2.4: w(X — > XY) = 

_ c(X -» XY) 

~ m&x{c(X> X'Y') I (X XY) ^ (X 1 X'Y 1 ), X 1 C X, XY C X'Y'} 

where again we are assuming that X n Y = and X' (~l Y' = 0. 
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For each fixed support, there are rules that are not redundant with respect to any other, 
different rule; then, this quotient is undefined due to the emptiness of the set in the de- 
nominator, for instance, if all candidate rules to it are of too low support. By convention, 
we use oo as value of the confidence width in that case (equivalently, likening the max to 
a zero). We can identify easily which rules have infinite width (this proposition is reported 
here for the first time): 

Proposition 2.9. The value of w(X — > XY) is finite and well-defined if and only if 
either X 0, or Y has some proper superset Z with s(Z) > r. 

Proof. Indeed, if X = and no proper superset of Y reaches support above r in the datasct, 
then no rule can make — > Y redundant; conversely, for s(Z) > r, — > Z is different from 
X — > XY and makes it redundant if either I / 8 and Z = XY, or XY C Z; since this 
second case only needs to be applied to rules with X = 0, Y C Z suffices. ■ 

Thus, the only rules of infinite width are of the form — > Z with Z maximal under the 
condition that s(Z) > t, and their confidence would coincide with the normalized support 
of Z. We observe in passing that, in practice, such maximal Z's usually have a support 
barely above r, because all supersets must have a support falling below r; whenever the 
confidence threshold is substantially higher than the normalized support threshold (which 
does not happen always but extremely often), all rules of infinite width will be filtered out 
by the confidence constraint. 

It is easy to prove a simple observation, that will be useful to compare below with confi- 
dence boost: consider the condition XY C X'Y' in the rules entering the maximization of 
the denominator; it can be written equivalently as follows, using the other condition that 
I'CI and the empty-intersection assumptions: 



Proposition 2.10. Assume X 1 C X, X n Y = 0, and X' n Y 1 = 0. Then XY C 



In [Balcazar 2009], some intuitions are described that suggest that, for a confidence thresh- 
old 7, a natural choice could be to set the confidence width threshold at 2 — 7; however, so 
far no formal support for this proposal (or any other proposal, for that matter) is known. 

2.2. Blocking Rules 

On the basis of a clear, simple intuition described in many papers (e.g. [Bayardo et al. 1999; 
Liu ct al. 1999; Padmanabhan and Tuzhilin 2000; Shah et al. 1999; Toivonen et al. 1995] just 
to name a few) , [Balcazar 2009] proposes also a notion of "rule blocking" , whereby a subset 
of the antecedent may "block" an association rule, that is, forbid its being provided in the 
output, if the confidence of the rule with the smaller antecedent and the same consequent 
is higher enough. 

The main question behind this option is the following. Consider an association rule X — > 
XY, and reduce the antecedent to a smaller Z C X. Whereas, intuitively, the rule with larger 
antecedent should be subsumed by the other, this is due to the human intuitive habit of 
working with full implications, where indeed this is the case. But this is not so anymore with 
association rules. For instance, at confidence 1, if A — > C holds, then AB — > C also holds, 
and does not bring new information. But association rules arc not implications; instead, 
they relate relative frequencies: compared to X — > XY, a smaller antecedent Z C X does 
not lead to a new rule Z — > ZY that entails it. Actually, for Z C X, cither of X — > XY 
or Z — > ZY may have arbitrarily higher confidence than the other. Indeed: rule X — > XY 
speaks about the abundancy of Y among the population of transactions that contain X; 
reducing the antecedent into Z changes the population into, in principle, a larger one, and 
Y can be distributed at very different rates along each of these two sets of transactions. The 
distribution of Y in the larger population supporting Z can be very imbalanced, so that Y 
can appear more frequently in either. 
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Example 2.11. Consider two association rules like A — > C and AB — > C. It is easy to 
construct examples where almost all transactions with A and B have C, but they are a 
small fraction of those having A, and thus the confidence of A — > C is very small, whereas 
that of AB — > C is high, even 1; conversely, C might hold for nearly all of the transactions 
having A, but it could be that the only transactions having both A and B are those without 
C and, then, the confidence of AB — > C can be zero yet the confidence of A — > C can be 
very high. 

Example 2.12. Returning briefly to the dataset of Example 2.2, it is easy to check that 
c(0 -> BC) < c(4 -> BC) whereas c(0 C) > c(B -> C). 

As a consequence, we also find the fundaments of the criticism that confidence does not 
detect negative correlations. 

Example 2.13. Fix a confidence threshold at 0.75, and consider a simple dataset with 
10 transactions: 3 transactions BC, 6 transactions just C, and 1 transaction B. Then 
c(B — > BC) = 0.75, reaching the confidence threshold. Most association miners would 
report B — > C as interesting at that threshold. However, the correlation between B and C 
is actually negative. Indeed, C is less frequent among the transactions having B than in 
the total population, as c(0 — > C) = s(C)/n = 0.9. 

The natural reaction, consisting of a normalization by dividing the confidence by the 
(normalized) support of the consequent of the rule, gives a parameter that we find in the 
references going by several different names: it has been called interest [Silverstein et al. 1998] 
or, in a slightly different but fully equivalent form, strength [Shah et al. 1999]; "lift" seems 
to be catching up as a short name, possibly aided by the fact that the Intelligent Miner 
system from IBM employed that name. The quantity is well-known in basic probability, as 
it measures the deviation from independence, as a multiplicative distance from the case of 
fully independent X and Y , which would give value 1 for it: 



(If supports are already normalized, then the factor n for the dataset size in the numera- 
tor has to be omitted.) The related parameter leverage [Piatetsky-Shapiro 1991] measures 
essentially the same thing, just that it does so as an additive distance. It must be noted 
that, contrary to confidence, the lift of X — > Y does not coincide with the lift of X — > XY: 
if we are to use lift, then we must be careful to keep the right-hand side Y disjoint from 
the left-hand side: X n Y = 0. Otherwise, misleadingly higher lift values are obtained. Note 
also that, in case X = 0, the lift trivializes to 1. 

However, this natural measure lacks the ability to orient the rules, because, in it, the 
roles of X and Y are symmetric. Additionally, lift is limited in its ability to control cases 
where c(Z — > Y) > c(X — > Y) for ^ Z C X. We describe a case found in data from real 
census information, pointed out also in [Balcazar 2009]. Mining for association rules at 5% 
support and 100% confidence the Adult dataset from Irvine [Asuncion and Newman 2007], 
67 (out of 71) rules in the basis are of the form "Husband" + something else — > "Male", 
and the other four rules are also of this form except for the addition of one more item in 
the consequent. The reason is that the rule "Husband" — > "Male", that we would expect to 
hold, does not reach 100% confidence: indeed, tuple 7110 includes the items "Husband" and 
"Female" (instead of "Male"). This opens the door to many rules, intuitively uninformative, 
that enlarge a bit the left-hand side, just enough to avoid tuple 7110 so as to reach confidence 
100%. The whole issue would not be solved by dividing all confidences by the support of 
"Male" . Further examples are given in the same paper, and in many others such as those 
cited above. 



Definition 2.14. The lift of rule X -> Y is 



c(X-¥Y) _ s(XY)xn 
s(Y)/n ~ s(X)xs(Y) ■ 
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It is desirable to react to the negative correlation problem for confidence and still maintain 
orientability. As an alternative approach to this problem, in [Balcazar 2009] the confidence 
parameter is used in an intuitive way to find a threshold at which a smaller antecedent 
would suggest to omit a given rule. The proposal there is fully equivalent to the following 
one: 

Definition 2.15. Given rule X — > XY, with X n Y = 0, a proper subset Z C X blocks 
X -> XY at blocking threshold b if 

s{XY) - c(z -> gyHjO < 

c(Z -> 2T)s(X) 

The threshold b is intended to take positive but small values, say around 0.2 or lower. 
The intuition behind this definition is as follows: we will want to discard rule X — > XY in 
case we find a rule Z — > ZY, with Zcl (and therefore ZY C XY", also properly), having 
"almost" the same confidence, or larger. (In the presence of a support threshold r, we would 
be requiring as well, naturally, that s(Z — > ZY) > r.) To do this, we compare the number 
of tuples having XY with the quantity that would be predicted from the confidence of the 
rule Z -> ZY. 

More precisely, let c{Z — > ZY) = c. If Y is distributed along the support of X at the 
same ratio as along the larger support of Z, we would expect s(XY) w c x s(X): we are, 
thus, considering the relative error committed by c x s(X) used as an approximation to 
s(XY). In case the difference in the numerator is negative, it would mean that s(XY) is 
even lower than what Z — > ZY would suggest. If it is positive but the quotient is low, 
c(Z — > ZY) x s(X) still suggests a good approximation to c(X — > XY), and the larger rule 
does not bring high enough confidence with respect to the simpler one to be considered: it 
remains blocked. But, if the quotient is larger, and this happens for all Z, then X — »■ XY 
becomes interesting since its confidence is higher enough than suggested by other rules of 
the form Z -> ZY . 

The higher the block threshold, the more demanding the constraint is. It can be checked 
that the particular problems of the Adult dataset indicated above are actually solved 
already by imposing just a generously tiny blocking threshold (around 0.000075). Again 
the specific choice of value for the blocking threshold is justified in [Balcazar 2009] just in 
merely intuitive terms; however, note for later use that the confidence width bound and the 
blocking threshold are related in that paper as follows: if the confidence width bound is b, 
then the blocking threshold proposed is b — 1. 

Example 2.16. Due to the inequalities in Example 2.12, we can see that, at any non- 
negative blocking threshold, blocks B — » C: 

s(XY) - c(Z -> ZY)s(X) _ s(BC) - c(0 -> C)s(B) _ 9 - 9.16 

cJZ -> ZY)s(X) ~ c(0 -> C)s(B) 9.16 K ' 

Likewise, considering A — > BC, we have 

s(ABC) - c(0 -> BC)s(A) 8 - (9/12) * 10 _ 
c(0 -> BC)s(A) (9/12) * 10 ~ 

so that this rule would be blocked by as soon as a blocking threshold higher than this 
quantity is imposed. 

2.3. Support Ratio 

We will relate our values of confidence width and of confidence boost to an expression 
essentially employed first, to our knowledge, in [Kryszkiewicz 2001], where no particular 
name was assigned to it. Together with other similar quotients, it was introduced with the 
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aim of providing a faster algorithm for computing representative rules; it turns out that, 
as demonstrated in [Balcazar and Tirnauca 2011], this approach is efficient and useful in 
practice but runs into the risk of providing incomplete output, as actual representative 
rules may be missed. The same reference provides further analysis, including almost equally 
efficient alternatives whose output is complete. 

Here we introduce this notion because it is related to all of our three parameters of 
confidence width, blocking, and confidence boost; it will allow us to explain more carefully 
their mutual relationships, and it allows for confidence boost constraints to be "pushed" 
into a closure mining process, as we will do in Section 6. 

Definition 2.17. In the presence of a support threshold t, the support ratio of an asso- 
ciation rule X — > XY is 

s(XY) 



a(X -> XY) = 



max{s(Z) \XY cZ, s(Z) > t} 



We see that this measure does not depend on the antecedent X but just on XY. Again, 
we set its value to oo if no Z exists as required for the maximization in the denominator. 
We have the following relationship: 

Proposition 2.18. If the value of a(X -> XY) is finite and well-defined then the 
confidence width w(X — > XY) is also finite, and then 

w(X -> XY) < a(X -> XY). 

Proof. This is easy to see from Proposition 2.9, and by observing that X — > Z, for the 
Z 7^ XY that maximizes the support in the denominator of support ratio, leads to w(X — > 
XY) < c(X -> XY)/c(X ->Z) = s{XY)/s{Z) = a{X -> XY) by simplifying the value of 
s(X) ^ 0. ■ 

It is clear that a(X — > XY) > 1 for all rules; a(X — > XY) = 1 exactly when XY is not 
closed, since these sets are those that have some proper superset Z with the same support. 
The following easy consequence is worth mentioning: many of the quantities we study for an 
association rule X — > XY are bounded from above by the support ratio and, therefore, will 
trivialize to values less than or equal to 1 unless we consider only closed sets XY as right 
hand sides. Together with Proposition 2.5, this is the reason of the importance of the closure 
notion in our context, and of the introduction of a closure-aware version of confidence boost 
in Section 4. 

Example 2.19. Looking again at association rule A — > BC in Example 2.2, we see that 
a(A -> BC) = s{ABC)/s(ABCDE) = 4/3. 

3. CONFIDENCE BOOST 

This section introduces the first, simpler version of our main notion; it is very similar to the 
one given for confidence width, but with a twist that, even though formally tiny, semantically 
changes it far enough so as to encompass the notion of blocking. 

Definition 3.1. The confidence boost of an association rule X — > XY (always with 
X n Y = 0) is /3(X -> XY) = 

c(X XY) 

~ max{c(X' X'Y') \ (X XY) ^ (X' X'Y'), X' C X, Y C Y'} 

As in previous cases, the rules in the denominator are implicitly required to clear the 
support threshold: s(X' — > X'Y') > t. Again, in case the set in the denominator is empty, 
the confidence boost is infinite by convention. As in Proposition 2.9, we can point out 
exactly which rules fall in that case: the same ones, in fact. 
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Proposition 3.2. The value of (3(X — > XY) is finite and well-defined if and only if 
either I^U, or Y has some proper superset Z with s(Z) > t. That is: the set of rules of 
infinite confidence boost coincides with the set of rules of infinite width. 

Proof. Like in Proposition 2.9, if X = and no proper superset of Y reaches support above 
r in the datasct, then no different rule (of sufficient support) is available for the set in the 
denominator. Conversely, — > Z belongs to that set if either X ^ 0, or Y C Z. m 

As indicated above, these cases of infinite confidence boost hardly ever appear in practice, 
due to their confidence being below the threshold. 

Example 3.3. Considering again association rule A — > BC in Example 2.2, we find a 
value of the confidence boost of 16/15 for this rule. This is obtained as follows: we consider 
all rules X' — >• X'Y' with X' C A and BC C Y' (and different from it); one can see that 
the maximum confidence among them is 0.75, attained by — > BC. Then ft (A — > BC) = 
0.8/0.75 = 16/15 w 1.066. 

The fact that a low confidence boost corresponds to a low novelty is similar to the 
analogous explanation for width, and can be argued intuitively as follows. Suppose that 
f3(X — > XY) is low, say j3(X — > XY) < b, where b is just slightly larger than 1. Then, 
according to the definition, there must exist some different rule X' — > X'Y', with X' C X 

and Y C X'Y', such that ^xCx^Y') - & ' or C ( X ' ^ X ' Y ') ^ C ( X ^ XY)/b. This 
inequality says that the rule X' — > X'Y' , stating that transactions with X' tend to have 
X'Y' , has a confidence relatively high, not much lower than that of X — > XY; cquivalcntly, 
the confidence of X — > XY is not much higher (it could be lower) than that of X' — > X'Y' . 
But all transactions having X do have X' , and all transactions having Y' have Y, so that 
the confidence found for X — > XY is not really that novel, given that it does not give so 
much additional confidence over a rule that states such a similarly confident, and intuitively 
stronger, fact, namely X' — > X'Y' . 

At a bare minimum, we should not consider association rules with confidence boost 1 or 
less. Notice that this solves the objection against confidence that negative correlations go 
undetected: for instance, if the support of B is, say, 80%, a rule A — > B of confidence less 
than that would yield a confidence boost below 1, due to the rule 0^5. 

3.1. Boost, Lift, Support Ratio, Width, and Blocking 

We present now some analyses clarifying the properties of the confidence boost. First, we 
see that it allows one to filter out rules that would be discarded on the basis of lift, since 
rules of low lift have low confidence boost. 

PROPOSITION 3.4. Let X ^ 0; then, the confidence boost [3(X -> XY) is bounded from 
above by the lift of X — »■ Y. 

Proof. We simply consider the rule — > Y, which differs from X — > Y since I ^ I. Its 
support is above that of X — > Y and thus above the support threshold. Clearly, it appears 
among the rules considered to maximize the confidence in the denominator of the definition 
of p(X -> XY), hence f3(X -> XY) < ^gjgffl ; but c(0 -> Y) = s(Y)/n and then 



c(X^XY) 



is exactly the lift of X — > Y. 



In the case where X = 0, the lift is 1, as already indicated; this value turns out to be 
uninformative in this case, since any right-hand side is independent from 0. Confidence 
boost does apply to this case, being able to detect low novelty through larger consequents. 

The only formal difference between confidence boost and confidence width of a rule X — >• 
XY is that, upon exploring alternative rules X' — > X'Y', in the confidence boost the 
antecedent X is not required anymore to be a subset of the consequent X'Y' , whereas it 
must be for X' -4- Y' to qualify in the computation of the width. More precisely, given that 
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lnF = f) and X' C X, it follows X' n F = 0, so that the condition F C y is equivalent 
to the condition Y C X'F'. Proposition 2.10 tells us that IF C X'F' <S=^> (X - X') C 
y' and y C y', and we see that confidence boost simply keeps the inclusion among the 
right-hand sides Y C Y' and does not require additionally that (X — X') C y anymore. 
This also tells us that all rules X' — > X'Y' that are considered for the maximization in the 
denominator in the definition of confidence width are also considered for the corresponding 
maximization in confidence boost. Thus, the value of the maximum itself is at least the 
same, or possibly larger, and the difference is that the boost case may consider further 
candidates to X' X'Y'. That is: 

Proposition 3.5. The confidence boost of a rule is bounded above by its confidence 
width: [3(X -> XY) < w(X -> XY). Hence, [3(X -> XY) < a{X -> XY). 

The last sentence comes from Proposition 2.18, and was proved directly first in [Balcazar 
et al. 2010b]. For the next theorem, we state separately a simple technical equivalence. 

Lemma 3.6. Z C X blocks X ^ XY at block threshold b- 1 if and only if ^§E^rry < b. 
Proof. By definition, Z C X blocks X — > XY at blocking threshold b — 1 if and only if 

s(XY) - c(Z ^ ZY)s(X) <b 
c(Z -> ZY)s(X) 

Multiplying both sides of the inequality by c(Z — > ZY), separating the two terms of the left- 
hand side, and replacing s(XY)/s(X) by its meaning, c(X — »■ XY), we find the equivalent 
expression 

c(X -> XY) - c(Z -> ZY) < (6 - l)(c(Z -> ZY) 
where solving for b leads to 

cjX ^XY) <b 
c(Z-^ZY) ~ ' 

All the algebraic manipulations are reversible (in particular, confidences and supports ap- 
pearing all along are never zero so we can multiply or divide by them without trouble.) ■ 
We show next that confidence boost embodies exactly both blocking and confidence width, 
precisely with the same relation between the thresholds as used in [Balcazar 2009], under 
the already stated proviso that all the association rules involved must clear the support 
threshold. 

Theorem 3.7. For an association rule X — > XY, f3(X — > XY) < b if and only if either 
w(X — > XY) < b or X — > XY is blocked at a blocking threshold 6 — 1. 

Proof. First we prove that either of low width or blocking imply low boost. We have already 
argued in Proposition 3.5 that f3(X — > XY) < w(X — > XY). Likewise, assume that Z C X 
(proper subset) blocks X — > XY at a blocking threshold 6—1. Clearly the rule Z — > ZY 
differs from X — > XY since Z is a proper subset of X and fulfills the conditions to enter the 
maximum confidence denominator in the definition of confidence boost. This means that 
this maximum is at least as large as c(Z — > ZY), and therefore, by Lemma 3.6, 

Conversely, we assume now that f3(X — > XY) < 6 and prove that either w(X — > XY) < b 
or X — > XY is blocked at a blocking threshold 6—1. The definition of confidence boost tells 
us that there is a different rule X' -> X'Y' (X' n Y' = 0) for which s{X'Y') > t, X' C X, 

y C y, and c ( C ^I^x'y'" > ') — b- We consider two cases, according to whether X = X' . If 
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X = X', necessarily Y C Y' properly, thus XY C X'Y' properly and s(X) = s(X') plus 
s{X'Y') > t tells us that 

Otherwise, X' C X properly, and Y C Y' (and a fortiori X' n Y = 0) gives us c(X' — > 
X'y) > c(X' — > X'y') whence J^Z^x^) — c(ffi^x'P') — ^ Applying again Lemma 3.6, 
we obtain that X' blocks X — > y at blocking threshold 6 — 1. ■ 

Hence, bounding the confidence boost at & ensures us that the rules that would be filtered 
by that confidence boost bound are exactly the same as those that would be filtered by 
either (or both) of the checks w(X —>■ XY) < b or blocking at threshold b— 1. In this sense, 
confidence boost embodies both low- novelty tests from [Balcazar 2009], and with the same 
thresholds employed there. 

We briefly consider the case of rules with a single item in the antecedent. 

Proposition 3.8. Assume that \X\ — 1 in rule X -> XY, that is, the left hand side 
is a single item. Then (3(X — > XY) coincides with the minimum among the lift of X — > Y 
anda(X -> XY). 

Proof. Let X' -> X'Y' be the rule that leads to (3(X -> IF) = c(X -> X7)/c(X' -> X'Y'). 
It must be different from X — > XY, and must clear the support threshold. 

If X' C X, as X is a singleton, we have X' = 0, s(X') = n (the number of transactions 
in the dataset), Y C y', s(y') < s(y), and 

P{X ^ XY )-—————-———— > 



c(X'^X'Y') s{Y')/n ~ s{Y)/n s{X) x s(Y) 

which is the value of the lift; but the boost is also less than or equal to the lift by Proposition 
3.4, and they must coincide. The support ratio must be higher by Proposition 3.5, so the 
confidence boost equals the stated minimum. 

The other case is where X' = X; then, as the two association rules are different, neces- 
sarily XY ^ X'Y 1 = XY', so that o{X -> XY) < s(XY)/s(XY / ) = c(X -> XY)/c(X -> 
XY') because we can divide by s(X) ^ 0; that is, a{X XY) < @{X XY). 
The converse inequality is furnished by Proposition 3.5 and, once we have the equality 
o{X — > XY) = (3(X — > XY), the fact that this value is the indicated minimum comes from 
Proposition 3.4. ■ 

Corollary 3.9. Assume a threshold b in place such that a(X — > XY) > b is known, 
for \X\ = 1, that is, for a rule with a single antecedent item. If the lift of X — > Y is less 
than b, then it equals {3(X — > XY). 

Example 3.10. We revisit again association rule A —> BC in Example 2.2. For this rule, 
the lift is 16/15, less than the support ratio 4/3, so that the former coincides with the 
confidence boost as per Proposition 3.8. The quantities evaluated in previous examples lead 
now to the inequalities 

(3(A -> BC) = 16/15 < w(A -> BC) = 6/5 < a(A -> BC) = 4/3 

which obey, of course, all inequalities we have proved so far and, at the same time, witness 
that each inequality may well be proper. 

3.2. Double-Threshold Confidence 

In order to be of practical use, we need a deeper study of the confidence boost. As it 
currently stands, it makes no sense to traverse all the alternative rules to be taken into 
account for computing the maximum confidence in the denominator. The same sort of 
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difficulty appears for confidence width and for blocking. A mild prccomputation allows one 
to compute quite efficiently the width [Balcazar 2009], but the same method does not seem 
to work for blocking or boost. In fact, the experiments reported in that reference resort, as 
indicated there, to an approximation to blocking. 

By the reasons already discussed, we will not be interested in confidence boost bounds of 
1 or less; above 1, by Proposition 3.5, we only find representative rules. Given confidence 
threshold 7, we will show that, in order to test the confidence boost threshold, it suffices 
to do so against the set of representative rules computed at a lower confidence threshold, 
namely 7/fo. Indeed, consider Algorithm 1. The comparisons are written there in such a way 
so as to avoid division by zero in the cases of infinite boost, such as s(XAY) — 0, which 
may potentially be the case. 



Algorithm 1: A double confidence threshold algorithm 

Data: dataset T>; thresholds for support t, for confidence 7, and for confidence boost 

b > 1; rule A -> AY with X n Y = 0, c(X -> XY) > 7, and s(XY) > r 
Result: boolean value indicating whether (3(X — > XY) > b 
mine V for the representative rules 1Z at threshold 7/6 

for each rule X' X'Y' e K such that X' n Y' = 0, X' C X and Y C Y 1 do 
if 3Z C X - X' such that c(X -> XY) <bx c(X'Z -> X' ' ZY) then 
L return False 

if 3A e V - XY such that c(X — > XY) < b x c(X -> XAY) then 
L return False 

return True 



Theorem 3.11. Let X — »■ XY be a rule of confidence at least 7. Then, Algorithm 1 
accepts it if and only if (3(X — > XY) > b. 

Proof. First we see that the rejections are correct. In each case, we just found a rule X" — > 
X"Y" with X" C X and Y C Y" , be it X'Z -> X 1 ZY or X -5- XAY; also X" — > X"Y" ^ 
X — > XY: in the first case, Z is a proper subset of X — X' , so A'Z 7^ X, and in the second 
case the item A did not appear in X — > AY". In each case, the rule A" — >• A"Y" enters the 
maximization in the denominator of the confidence boost and shows that its value is less 
than or equal to b. 

To see that acceptance is correct, assume f3(X — > XY) < b: we prove that, at some point, 
rule A — ► XY must fail one of the two tests in the algorithm. By the definition of confidence 
boost, there must exist some rule A" -> X"Y", different from A -> AY", with A" C A 
and Y C Y", such that c(A -> AY) < 6 x c(A" -> X"Y"). 

Then, from c(A AY) > 7 we infer c(A" -> X"Y") > 7/6, so that there must exist a 
representative rule at confidence 7/6, let it be A' — > A'Y' € 7?., that makes A" — > A"Y" 
redundant (possibly itself): by Lemma 2.4, A' C A" and A"Y" C A'Y'. At some point 
(unless a correct negative answer is found earlier), the algorithm will consider this rule 
A' — > A'Y' e 7^. As in the proof of Theorem 3.7, we distinguish two cases. 

First assume that A" is a proper subset of A, A" C A. Since A' C A", we can consider 
Z = A" — A' C A — A': at some point, the algorithm will compare c(A — > AY) to 
6 x c(A'Z -> A'ZY). But it holds that X'Z = A" and that Y C Y", resulting in c(A 
AY) < & x c(A" A"Y") < b x c(A'Z A'ZY) and failing the test. 

Alternatively, assume A" C A holds with equality: A" = A. From A" ->■ A"Y" 7^ 
A -> AY (and using A n Y = and A" n Y" = 0) we know that Y C Y" is a proper 
inclusion: there is some A e Y" C A'Y' that is not in Y. Such A is not in A cither, because 
A" n Y" = A n Y" = 0, and then, in fact, A A', so that A e Y' — XY. In due time, the 
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algorithm will compare c(X -> XY) to b x c(X -> XAY). But X = X" , and ^ e F" so 
that AY C Y", hence c(X XY) <bx c(X" -> X"Y") < b x c(X -> XAY) and the test 
will fail as well. This completes the proof. ■ 

4. CLOSURE-BASED CONFIDENCE BOOST 

Representative rules are a minimum size basis for redundancy, defined as per Lemma 2.4; 
still, they constitute often a large set. Prior to accepting the option of losing information 
in a quantifiable manner, as we are doing via confidence boost, one could consider the 
option of using stronger notions of redundancy Several earlier papers, e. g. [Luxenburger 
1991; Pasquier et al. 2005; Zaki 2004], suggest to treat separately the implications, which 
allow for more compact bases, from the partial rules. In [Balcazar 2010c], besides another 
more complicated alternative, we follow up this suggestion as well, and employ a notion 
of closure-based redundancy which also turns out to provide a complete basis of provably 
minimum size, denoted B*. This option has definite advantages: whereas it provides bases 
comparable in size with, and often clearly smaller than, the set of representative rules, 
it has the desirable property that the portion of it that refers to partial associations (of 
confidence below 1) can be computed faster. The best approaches to the representative 
rules need to work on the basis of both the closures lattice plus all the minimal generators 
of each closure ( [Kryszkiewicz 2001], but see the related discussion in [Balcazar and Tirnauca 
2011]); instead, the B* basis can be computed just from the closures. In this section, we 
port confidence boost into closure-based redundancy and the corresponding minimum-size 
basis B* . 

Closure-based redundancy corresponds to restricting consideration of datasets as a func- 
tion of the closure operator they induce. It is well-known that the closure operator is equiv- 
alently specified by a set of implications, that is, association rules of confidence 1 (see 
e. g. [Zaki 2004]). Closure-based redundancy [Balcazar 2010c] takes into account the closure 
operator indirectly as follows: 

Definition 4.1. Let B be a set of implications. Partial rule Xq — > XqYq has closure-based 
redundancy relative to B with respect to rule X\ — > X{Y\ if the inequalities 

c(X -> X Q Y ) > c{Xx -> XxYx) and s(X -> X Y ) > s(X 1 -> X X Y X ) 

hold in any dataset T> in which all the rules in B hold with confidence 1. 

This redundancy has a characterization parallel to that of Lemma 2.4, proved in the same 
reference: 

Lemma 4.2. Let B be a set of implications. Consider two association rules, X — > X Y 
and X\ — > X\Y\. The following are equivalent: 

(1) Rule X — > X Y a has closure-based redundancy relative to B with respect to rule X\ — > 
X l Y 1 _ 

(2) X 1 C X and X Q Y C X X Y X . 

The closure operator in the second statement is the one corresponding to the set of implica- 
tions B. 

In all applications, B is the set of full-confidence implications holding in the dataset, 
so that the closure operator is actually the one induced by the dataset. For closure-based 
redundancy, a minimum-size basis can be constructed as well. Essentially, this basis, denoted 
B* for confidence threshold 7, is defined in a manner analogous to that of the representative 
rules, except that it is restricted to rules of the form X — > XY where both X and XY are 
closed sets, instead of X being a minimal generator as in representative rules. All these 
definitions are studied in depth in [Balcazar 2010c]. 
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If we are to employ this notion of redundancy and the B* basis, then the definition 
of confidence boost requires some fine tuning. This basis is often smallish because many 
different representative rules could correspond to many left-hand sides that are minimal 
generators of the same closure. Such sets of rules become a single rule in B* . But, if we use 
the given definition of confidence boost, these rules are syntactically different from the one 
in B* and "kill it" by forcing its boost down to 1. Thus, to avoid trivializing B* , we need 
to take into account the closure operator in the definition of boost. The main notion of this 
section is as follows: 

Definition 4.3. The closure-based confidence boost of a rule X — > XY is f3(X — > XY) = 

_ c(X -» XY) 

max{c(A' -> X'Y') \ (X ^ X 7 V XY ^ XW 7 ), X 1 C X, Y C TV 1 } 

This is the natural definition paralleling the confidence boost when the notion of redun- 
cancy is closure-based: on one hand, the rules in the denominator may resort to the use of 
closures to make the rule at hand redundant, widening the options of redundancy; on the 
other hand, rules that are syntactically different from the rule at hand, but equivalent to 
it in closure-based redundancy, must be discarded, as they trivially entail the rule at hand. 
Failing to discard them unduly trivializes the confidence boost in many cases. Observe that 
the notion of confidence boost in the previous section corresponds to the particular case 
where the closure operator is the identity function. 

Exam-pie 4.4. Out of the seven representative rules at confidence threshold 0.8 that we 
enumerated in Example 2.6, some are unchanged in £>o.8> sucn as C — > AB, B — > C, — > C, 
and — > AB. Instead of A — > BC, we find AB — > C, which is equivalent to it due to the 
implication A — > B; and, due to the implications D — > CE and E — > CD, it suffices to 
keep CDE — > AB instead of the other two. If we were to employ plain confidence boost, 
(3{CDE -> AB) < 1, due to rules D -S- ABCE and E -> ABCD. Closure-based confidence 
boost is able toperform a finer distinction. As these two rules have the same closure of the 
antecedent as D = E = CDE, and the same associated closed set ABCDE, they do not 
enter the computation of closure-based confidence boost of CDE — ¥ AB, which is actually 
(3{CDE -> AB) = c(CDE -> AB)/c{C -> ABDE) = 10/7 > 1. 

4.1. Double-Threshold Confidence Revisited 

We develop next an algorithm to compute closure-based confidence boost. We just need to 
make a number of adjustments to the one given for plain confidence boost: first, one must 
explore the rules of the B* basis for confidence 7/6, instead of the representative rules for 
it, since that is the appropriate basis for closure-based redundancy; and, second, one must 
take into account the closure operator at the time of checking whether a specific B* rule 
may lead to guaranteeing low boost of the input rule. 

Theorem 4.5. Let X — > XY be a rule of confidence at least 7. Algorithm 2 accepts it 
if and only if [3(X -> XY) > b. 

Proof. We follow essentially the same steps as in Theorem 3.11, although we must argue 
more carefully about the places where the closure operator plays a role. Again, we see 
first that the rejections are correct. In each case, we just found a rule X" — > X"Y" with 
X" C X and Y C Y", be it X'Z -> X'ZY or X -> XAY. In both cases, (X ^ X"VAT/ 
X"Y") holds: in the first case, X'Z ^ X is explicitly checked, whereas, for the second case, 
A e XAY C XAY but A ^ XY. In each case, the rule X" -> X"Y" contributes to the 
maximization in the denominator of the confidence boost and shows that its value is less 
than or equal to b. 
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Algorithm 2: A variant of Algorithm 1 for closure-based confidence boost 

Data: dataset T>; thresholds for support r, for confidence 7, and for closure-based 

confidence boost b > 1; rule X ->• XY with 1(17 = 0, c(X -> XY) > 7, and 
s(I7) > r 

Result: boolean value indicating whether /?(X — > JF) > & 
mine 2? for the basis 2?* at threshold 7/6 

for each rule X' -> X'Y' G £* /b w/iere X' n V = 0, unto I'CI and Y C W 7 do 

if C X — X' such that X' Z C X (with inequality) and 
c(X -> XY) <bx c{X'Z -+ X'ZY) then 
L return False 

if 3A e X'Y' - XF sucft t/iai c(X -> 17) < 6 x c(X -> XAY) then 
L return False 

return True 



To see that acceptance is correct, assume (3(X — > X7) < b: we prove that, at some point, 
rule X — > X7 must fail one of the two tests in the algorithm. By the definition of closure- 
based confidence boost, there m ust exist some rule X" -> X"Y" with X" CIJC X"Y", 
and (X ^ X" V XY ^ X"Y"), and such that c(X -> 17) < 6 x c(X" -> X"Y"). Then, 
from c(X — > X7) > 7 we infer c(X" — » X"Y") > 7/6, so that there must exist a rule in 
the basis fi* /6 , let it be X' -> X'Y', that makes X" ->• X"Y" redundant (possibly itself) 

under closure-based redundancy. By Lemma 4.2, X 1 C X 77 and X"Y" C X'Y 7 = X'Y', 
where the last equality is due to the fact that X' — > X'Y' e S*^ b so that X'Y' is closed. At 
some point (unless a correct negative answer is found earlier), the algorithm will consider 
this rule X' — > X'Y' € As in the proof of Theorem 3.7, we distinguish two cases. 

First assume that X 77 C X. Since X' C X 77 , we can consider Z = X 77 - X' C X - X': at 
some point, the algorithm will compare c(X — > XY) to 6 x c(X' Z — > X'ZY). But it holds 
that X'Z = X 77 and that 7 C X"7", resulting in c(X -> 17) < & x c(X" — > X"7") = 

6 x c(X 77 -> X"7") < b x c(X'Z -> X'ZY) and failing the test. 

Alternatively, let's consider the case where X" C X holds with equality: X" = X, so 
that XY ^ X 77 } 777 ; on the other hand, we know now X C X = X 77 C X 77 } 777 , and also 

7 C X"7", so that XY C X"Y". 

Assume briefly that Y" C XY: as X" C X" = X C XY, we would obtain X"Y" C XY 

and, therefore, the equality XY = X"Y"; however, we know that this equality does not 
hold. 

Hence, Y" is not included in XY, and there is some A e Y" C X'Y' that is not in XY, 
that is, A e X'Y' — XY. (If we know that X = X, for instance when the rule X — > 
XY comes from a B* basis, X' C X" = X = X tells us that the search for A can be 
circumscribed further to just A e Y' — XY.) In due time, the algorithm will compare 
c(X XY) to 6 x c(X -> XAY). But X = X 77 , and A E Y" so that XAY C X 77 ! 777 , hence 
c(X -> XY) < 6 x c(X" -> X"Y") = 6 x c(X 77 -> X 77 } 777 ) < & x c(X -> XAY) and the test 
will fail as well. This completes the proof. ■ 

We report on a second algorithm below. 

4.2. Inequalities 

Compared to confidence boost, closure-based confidence boost relaxes the alternative rules 
to which a given rule is compared, e.g. by allowing left hand sides included in X that are 
not included in X; but, on the other hand, restricts them by the proviso that the rules are 
"incquivalent" in a closure-based sense, and not just different. Therefore, either can end 
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up being higher than the other, and the relationship with other quantities like width or 
support ratio become less clear. We must review which inequalities still hold; we start with 
the (partial) analogs of Propositions 3.5 and 3.4. 

Proposition 4.6. Assume XY closed. Then, the closure-based confidence boost is 
bounded by the support ratio: (3(X — > XY) < er(A — > XY). 

Proof. Let Z be the proper superset of XY of largest support above r, so that a(X — > XY) = 
s(XY)/s(Z). As XY is closed, Z ^ XY. Rule X — > Z enters, therefore, the maximization in 
the denominator of the closure-based confidence boost and leads to j3(X — > XY) < c(X — > 
XY)/c(X ->Z) = s(XY)/s(Z) = a(X -> XY). m 

Proposition 4.7. Assume s(X) < n, the dataset size; then, the closure-based confi- 
dence boost j3{X — > XY) is bounded above by the lift of X — > Y . 

Proof. We consider the rule — > Y. For it to play a role in closure-based confidence boost, 
we need 0^1, which is equivalent to s(X) < n. The rest of the argumentation is as in 
Proposition 3.4: its support is above the threshold, and (3{X — > XY) < ^jp^pp which is 
the lift of A ->■ F. ■ 

It is interesting to note that the condition about the left-hand side being nonempty in 
Proposition 3.4 corresponds now to having support less than the dataset size: the intuition 
is that any items that appear in all transactions become part of the closure of the empty 
set, which is now the limit case. 

We discuss now some relationships between the plain and the closure-based versions of 
the confidence boost. 

Proposition 4.8. Let X — » XY be an association rule where XY is a closed set and 
X is a minimal generator. Then, (3(X — » XY) < f3(X — > XY). 

Proof. Let /3(A ->• XY) = b: there must be a different rule X' -+ X'Y' such that X' C X, 
Y C Y', and ^x'^lx^Y') ~ ^ Assume first that X' C X. As X is a minimum generator, 
any subset of X has strictly larger support. Hence, s(X) = s(X) ^ s(X') = s(X'), which 
implies that I/I'; then, the same rule X' — > X'Y' is accounted for in (3 as well, and 
leads to a value of at most b. 

The remaining case is X = X' , which requires that XY ^ X'Y' . Moreover, both X = 
X' C X'Y' and Y C X'Y' by the definition of confidence boost, and XY is closed, so that 
XY = XY C X'Y' C JCY 7 . Again in this case X' -> A'Y"' is accounted for in 0, and the 
stated inequality holds. ■ 

Corollary 4.9. Let X — > Ay 6e a representative rule at any confidence threshold; 
then (3{X -> IF) < /3(A -> 17). 

One interesting particular case is that of rules of confidence 1 formed when A is a mini- 
mum generator of the closed set XY itself; these rules form the min-max exact basis from 
Definition 2.3 [Pasquicr et al. 2005] (a nonminimal basis for the implications of confidence 1, 
as the GD basis is sometimes smaller [Guigues and Duquenne 1986]). Proposition 4.8 applies 
to these rules as well, of course. On the other hand, we have: 

Proposition 4.10. Let X -> XY be an association rule where both X and XY are 
closed sets. Then, [3{X -> XY) < [3(X -> XY). 

Proof. Let ~j3(X -> XY) = b: there must be a rule A' X'Y' such that c fx>^x?Y') = 
fulfilling the conditions A' C A, Y C A 7 ! 77 , and either A ^ A 7 or AT ^ A 7 ! 77 . We observe 
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first that, as X is closed, X' C X = X. Together with X n 7 = 0, we get for later use that 
X 1 n 7 = as well. 

We modi fy the rule X' — > X'7' by extending its right-hand side into a closed set, as 
X' — > X'y, which has the same confidence, and then rewrite it into X' — > X'Y" by setting 
7" = XW 1 - X' . Note that Y C X 7 ! 77 , together with X' n F = 0, leads to F C 7". 

Hence, with that rule written in this form, the properties become c ^~^x'Y") = ^ — 

X = X, y C y", and cither X ^ X 7 or XY ^ JUT". It suffices to show that X' -> X'y" 
and X — >• Xy are different rules to ensure that X' — > X'y" participates in the computation 
of (3(X — »• Xy) and, hence, to obtain the desired inequality. But: if X 7^ X', then necessarily 
X 7^ X'; and, in the other case, XY = XY ^ JUT" = X'Y" as both XY and X'Y" = JUT 
are closed sets. This completes the proof. ■ 

As the B* basis consists of rules where both antecedent X and consequent XY are closed 
sets, we obtain: 

Corollary 4.11. Let X -> Xy be a rule in the B* basis (at confidence c(X -> 17) J; 
iften, /3(X ->■ Xy) < /3(X -> 17). 

For the not unusual cases where a representative rule participates as well in the B* basis, 
Section 3 suggests measuring its confidence boost, whereas Section 4 would propose to 
measure its closure-based confidence boost. Now we see that there is no conflict: 

Corollary 4.12. If X — > XY is both a representative rule and a member of the B* 
basis (both at confidence c(X XY) ), then f3(X -> XY) = /3(X -> XY). 

This follows at once from Corollaries 4.9 and 4.11. 

Example 4.13. In general, either of /3 and j3 can be strictly larger, when permitted by the 
statements we have proved so far. In Example 4.4, we saw a B* rule for which f3(CDE — > 
AB) > (3(CDE -> AS). This also shows that Corollary 4.9 cannot be extended to the 
B* basis. Conversely, as A = .AS in our running example, rule B — > C is taken into 
account for the closure-based confidence boost of the representative rule A — > £?C, leading 
to /3(A BC) < 1, whereas /3(A -> BC) = 16/15 as we saw in Example 3.10. 

We develop some further inequalities and yet another algorithm that we will employ in 
Section 6. 

Theorem 4.14. Assume that a threshold b has been fixed for the closure-based con- 
fidence boost. Consider rule X — > XY where both X and XY are closed sets. Then 
(3(X — > XY) < b if and only if either a(X — > XY) < b, or there is some closed proper 
subset X' C X, c(X -> XY) < b x c(X' -> X'Y). 

Proof. Assume first j3(X — > XY) < b. Let X' — > X'Y' be the rule in the denominator of the 
definition of j3 that leads to its actual value. Due to Y C X'Y', we have c(X' — > X'y) > 
c(X' — > X'y'). If X' 7^ X, as X is assumed closed, we can state X' C X = X so that, by 
monotonicity, X' C X 7 C X = X. Thus, ^gjpFj < e ^^x'y') = ^ Xy ) ^ 6 ' and 
the second case holds. If, on the other hand, X' = X, then s(X) = s(X) = s(X') = s(X') 
and, necessarily, XY ^ JUT 7 : yet XF = XY C X'y' C X'7 7 as 17 is closed, hence 
X7 = X7 C X 7 ^ 7 , leading to cr(X -> 17) < /ff^ = ^Zx^y*) = P( X ~> XY ) ^ b - 

Conversely, if <r(X — > XY) < b then (3(X — > Xy) < & by Proposition 4.6. Also, assuming 
X' C X gives us c(X -> 17) < ix c(X' -> X'y), where both X and X' are closed, 
X' = X' C X = X so that X' 7^ X, and rule X' — > X'y participates in the computation 
of 0(X -> Xy), leading to £(X -> 17) < ff^^^ < b. • 



ACM Journal Name, Vol. V, No. N, Article A, Publication date: January YYYY. 



A:22 



Jose L Balcazar 



For convenience in a later application, we restate this theorem in its contrapositive form: 

Corollary 4.15. Assume that a threshold b has been fixed for the closure-based con- 
fidence boost. Consider rule X — »■ XY where both X and XY are closed sets. Then 
(3(X — > XY) > b if and only if both a(X — > XY) > b and for every closed proper sub- 
set X 1 C X, c(X XY) >bx c(X' -> X'Y). 

Yet another application of this theorem is to identify the analog of Proposition 3.8 for 
the closure-based case. To get there, it is convenient to factor off the proof the following 
technical but easy fact: 

Lemma 4.16. Let X be a closed singleton, that is, X = X and \X\ = 1. If s(X) < n, 
then there is exactly one closed proper subset of X , namely = 0; and, besides, X is free, 
that is, it is a minimum generator of itself. 

Proof. By definition, contains exactly those items that appear in all the transactions. By 
monotonicity, as C Z for all Z, is a subset of all closures. If X is a closed singleton, either 
= or = X; this second case is ruled out by the condition s(X) < n, as s(0) = s(0) = n. 
Our statements follow. ■ 

PROPOSITION 4.17. Assume that \X\ = 1 in rule X -> XY, that is, the left hand side 
is a single item. Further, assume that s(X) < n, and that X and XY are closed. Then 
(3{X — > XY) coincides with the minimum among the lift of X — > Y and o~(X — > XY). 

Proof. By Propositions 4.6 and 4.7, we already know that fi(X — > XY) is less than or equal 
to both quantities, under the given conditions. To complete the proof, we only need to show 
the converse inequality, that is, /3(X — > XY) is larger than or equal to the minimum among 
the lift of X -> Y and a(X -> XY). For this, we will apply Theorem 4.14: 0(X -> XY) < b 
if and only if either a(X — > XY) < b or there is some closed proper subset X' C X, 
c(X — > XY) < bxc(X' — > X'Y). We observe that, by Lemma 4.16, in our current conditions 
there is exactly one such X' , namely 0, and the last inequality becomes, then, the statement 
that the lift of X — » Y is at most b; indeed, the lift coincides with ^^pp- 

As we can chose any value of b, we pick simply b — f3(X — >• XY) itself, so that we can 
infer that either a{X -> XY) < b = ]3(X -> XY) or the lift of X -> Y is also at most 
b = ~j3(X -> XY). Thus, either a(X -> XY) or the lift of X -> Y are less than or equal 
to (3(X — > XY) and, certainly, the lesser of both quantities obeys the same bound, which 
completes the proof. ■ 

We obtain the corresponding variant of Corollary 3.9: 

Corollary 4.18. Assume a threshold b in place such that a(X — > XY) > b is known, 
for \X\ = I, that is, for a rule with a single antecedent item. If s(X) < n, X and XY are 
closed, and the lift of X — > Y is less than b, then it equals f3(X — > XY). 

As a consequence, f3(X — > XY) = f3(X -4- XY) for these cases. This is also consistent 
with Corollary 4.12: as we have stated in Lemma 4.16, in this case X is both closed and a 
minimal generator; if c(X — > XY) < 1, then this implies that it is equivalent to state that 
X — > XY is a representative rule and to state that it is in the B* basis. This corollary will 
be very relevant in the implementation described in Section 6. 

4.3. Alternative Algorithm 

Theorem 4.14 leads to an alternative algorithm to filter rules from the B* basis according 
to their closure-based confidence boost; we present it as Algorithm 3. Its correctness is 
immediate from Theorem 4.14. This algorithm is part of the tool described in Section 6; 
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it tends to be better than the previous one when left-hand sides tend to be small. It pays 
the price of traversing all closed subsets of a given closed set but spares traversing the 
alternative basis at lower confidence. In our implementation, as described below, the test of 
the support ratio is actually pushed into the closure mining, so that it becomes unnecessary 
to repeat it at the time of evaluating rules. 



Algorithm 3: An alternative algorithm for closure-based confidence boost 

Data: dataset V; thresholds for support r, for confidence 7, and for closure-based 

confidence boost b > 1; rule X XY with X n Y = 0, X and XY both closed, 
c(X -> XY) > 7, and s(XY) > r 
Result: boolean value indicating whether j3(X — > XY) > b 
if a(X -> XY) < b then 
L return False 
if 3Z C X closed such that c(X -> XY) <bx c(Z -> ZY) then 
L return False 
return True 



5. EMPIRICAL VALIDATION 

This section describes the outcomes of several empiric applications of the notions of con- 
fidence boost; the next section describes a complete tool that employs closure-based con- 
fidence boost, and the properties we have developed, to offer parameter-less association 
mining. With respect to specific datasets, we report first on objective figures: numbers of 
rules passing rather mild confidence boost thresholds on three datasets, all consisting of 
real world data, but of very different characteristics. Subsequently, we briefly discuss the 
much more difficult and subjective question of whether the rules that we find are actually 
the rules one may want. 

5.1. Quantitative Evaluation 

Dataset Adult is the training set part of the Adult US census dataset from UCI [Asun- 
cion and Newman 2007]. Dataset Retail was downloaded from the FIMI repository, and 
contains typical market basket data (http://fimi.cs.helsinki.fi/); and dataset Now 
(based on the Neogene of the Old World dataset, public release 030710 [Fortelius 2003]) is 
a transactional version of a paleontological dataset from Europe: we downloaded and pre- 
processed slightly file N0W_public_030710.xls, so that each paleontological site has been 
casted into a transaction, where the items in the transactions are the species of which fossile 
remains have been found at that site. Additional information such as name or geographical 
position of the site have been omitted, in order to keep the transactional format. 

Table I gives some information about the datasets: their size (in number of transactions), 
the number of items involved, and the total of item occurrences. Each dataset has been 
mined at two different levels of support and three different levels of confidence. Support 
thresholds were chosen so as to produce noticeable numbers of rules, and also to make 
sure that the closure spaces were nontrivial in size (several thousand closures). Table II 
reports, for each pair of support and confidence values, the basis size (RR/B*, standing for 
representative rules and B* basis respectively) and then the number of these basis rules, for 



Table I. Information about datasets 



Dataset 


Size 


Items 


Occurrences 


Adult 
Retail 
Now 


32561 
88162 
1597 


269 
16470 
3873 


358171 
908576 
14135 
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Table II. Sizes of RR/B* bases at confidence boosts 1 to 1.3 
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2011 / 822 
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> 1.20 


108 / 14 
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466 / 359 


526 / 14 


1285 / 466 


2049 / 1090 


> 1.25 


91/9 
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1991 / 1051 
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76/4 
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404 / 308 


473 / 4 


1158 / 384 


1842 / 929 



each basis, passing the corresponding confidence boost thresholds as given. Of course, for 
the B* case we bound the closure-based confidence boost. 

Our implementation was not particularly aimed at speed. Still, for instance, computing 
all the figures regarding the representative rule basis took less than 35 minutes on a low- 
range laptop. For the higher support threshold in each dataset, each computation time was 
between 20 and 45 seconds. For the larger, more demanding closure lattice at the lower 
support threshold of each dataset, these figures required between 2 minutes and up to a 
maximum of 6 minutes. It will not be difficult to improve the running times in future work, 
as a number of known accelerations can be applied; we are already undertaking this task. 
Computationally, the slowest part was always the construction of the closure lattice. 

With respect to the outcome, we see that the reduction of the number of rules is clear, 
and in some cases it is very considerable. Recall that the bound at 1 of the confidence boost 
discards those basis rules for which a rule with higher confidence can be obtained by either 
reducing the antecedent, enlarging the consequent, or both; in the first case, it would mean 
that the rule is actually a case of negative correlation that is better left off from the output. 

5.2. Subjective Evaluation 

Quantitatively, the figures just given imply that large fractions of representative rules are 
somewhat uninteresting in that they fully lack any novelty, measured according to confidence 
boost. However, one may question whether the actual rules passing the thresholds arc "the 
right ones". To our subjective perception, after seeing the outcome of our experiments, 
the whole process makes a lot of sense, but, in order to argue that indeed bounding the 
confidence boost leads to a worthy data mining scheme, we should find a more convincing 
argumentation. We hasten to add here that using the mined rules for classification will 
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Table III. Number of rules passing closure-based confidence boost bounds 



Conf. 


1 


1.05 


1.1 


1.15 


1.2 


1.25 


1.3 


1.35 


1.4 


1.45 


1.5 


70% 


948 


824 


689 


554 


417 


331 


247 


175 


142 


112 


85 


75% 


639 


541 


444 


356 


266 


212 


161 


112 


97 


76 


56 


80% 


367 


298 


231 


182 


132 


101 


78 


54 


43 


36 


26 



Table IV. Abbreviations of subjects for Tables V and VI below 



subjcct:BC Brain-Computer Interfaces 

subject:CI Computational, Information-Theoretic Learning with Statistics 

subjcct:IR Information Retrieval and Textual Information Access 

subject:LS Learning/Statistics and Optimisation 

subject:MV Machine Vision 

subjcct:TA Theory and Algorithms 



not provide a reasonable evaluation, since for such applications we must focus on single 
pairs of attribute and value as right-hand side, thus making useless to consider larger right- 
hand sides; and, also, the classification will only be sensible to minimal left-hand sides 
independently of their confidences (as in Subsection 7.2 below). Because of these properties, 
a classification task is not fine enough to provide information about the usefulness of the 
subtler confidence quotients involved in the confidence boost bounds. 

Clearly, the difficulty of this evaluation lies in the fact that the issue is largely subjec- 
tive. At the present moment, our way through is to involve "end-users" in the evaluation 
of the obtained association rules: persons that are extremely well-versed on the dataset at 
hand. Both for our version of confidence boost, and for a sensible extension of it to handle 
absence of items besides presence of items in the transactions, we are developing an anal- 
ysis of educational datasets, containing information about online courses on multimedia 
systems and on the Linux operating system, in close cooperation with the teachers of said 
courses [Balcazar et al. 20f0a]. Here, however, instead of looking for experts on a given 
dataset, we use a dataset for which some readers of this paper might be expected to be rea- 
sonably knowledgeable: in the same vein as the evaluations in [Gallo et al. 2007], we employ 
the titles, topics, and abstracts of all the reports submitted to the e-prints repository of 
the Pascal Network of Excellence along its early years of existence. This dataset, extracted 
from the repository by Professor Steve Gunn, was the object of a visualization challenge 
of the Pascal Network in 2006. (Professor Gunn has also kindly furnished to this author a 
similar but much larger dataset, to which we plan to apply the same scheme in the near 
future.) 

The collection of papers was processed starting from a plain text file containing one line 
for each of the 721 papers, including the title, the subjects chosen from among the specific 
choices allowed by the repository (marked by a ' !' sign that we changed into the word "sub- 
ject"), and the whole text of the abstract of the report. The (mild) preprocessing consisted 
in removing punctuation and nonprintable characters, mapping all letters into lowercase, 
stripping off stop words as per the list from www.textfixer.com, and removing duplicate 
words from each of the transactions so obtained. This left 45185 total word occurrences cho- 
sen from a vocabulary of 8233 items. We checked the size of the closure space at supports of 
10% (135 closures) and 5% (830 closures, still somewhat small), and then at 1% (too large, 
as after a few minutes the program was still computing the closure lattice's edges — in fact, 
a later run showed that it consists of 59713 closures). We settled for a far from trivial but 
manageable closure space consisting of 9621 closed itemsets obtained at 2% support. Then, 
we computed the B* basis at confidences 70% (1070 rules), 75% (729) rules, and 80% (412 
rules), and cut them down by filtering them at closure-based confidence boosts of 1, 1.05, 
1.1, 1.15, 1.2, 1.25, 1.3, 1.35, 1.4, 1.45 and 1.5. All the runs were almost instantaneous. The 
figures obtained, given in Table III, make it indeed possible to proceed to manual inspection 
of many of these options. 
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Table V. The 26 rules at 2% support, 80% confidence, 1.5 boost 



conf. 


supp % 








0.842 


2.219 


principal 


=> 


component 


0.842 


2.219 


unlabeled 


=> 


data 


0.882 


2.080 


approach method show 


=> 


data 


0.850 


2.358 


features selection 


=> 


feature 


0.842 


2.219 


methods subject:MV 


=> 


images 


0.833 


2.080 


nonlinear subjcct:LS 


=> 


learning 


0.810 


2.358 


kernel used 


=> 


method 


0.889 


3.329 


presents 


=> 


paper 


0.833 


2.080 


solve 


=> 


problem 


0.941 


2.219 


art 


=> 


state 


0.800 


2.219 


brain 


=> 


subject :BC 


0.914 


4.438 


document 


=> 


subjcct:IR 


0.907 


5.409 


documents 


=> 


subjcct:IR 


0.826 


2.635 


web 


=> 


subjcct:IR 


0.900 


2.497 


feature learning 


=> 


subject :LS 


0.850 


2.358 


features subjcct:TA 


=> 


subject :LS 


0.842 


2.219 


linear problem 


=> 


subject :LS 


0.833 


2.080 


data second 


=> 


subject :LS 


0.818 


2.497 


data subject:MV 


=> 


subject :LS 


0.818 


2.497 


more use 


=> 


subject :LS 


0.919 


4.716 


object 


=> 


subjcct:MV 


0.895 


4.716 


bound 


=> 


subject :TA 


0.889 


5.548 


bounds 


=> 


subject :TA 


0.818 


2.497 


graphs 


=> 


subject :TA 


0.813 


3.606 


variables 


=> 


subject :TA 


0.813 


10.264 


support 


=> 


vector 



Next, as a particular case, we chose to perform an examination of the 26 rules found at 
2% support, 80% confidence, and 1.5 (closure-based) confidence boost, which revealed rules 
with little or no redundancy among themselves, all of them semantically sensible, and with 
a handful of them actually quite interesting (for this author). The whole process leading 
to these "nuggets" lasted less than two hours, including all the preprocessing, for a single 
person (the author) and quite limited computing power (an old Centrino Solo laptop) . These 
rules are given in Table V. The predefined subjects of the e-prints Pascal server appearing 
in the table have been shortened to fit the page; Table IV reports the abbreviations used 
for them in Tables V and VI. 

By way of comparison, at the same level of support, at the most demanding possible 
level of confidence (100%), with the less redundant basis computation currently known 
(the Guigues-Duquenne basis, [Guigues and Duquenne 1986]), the result is 44 rules, with 
considerably more "intuitive redundancy" and less interest overall, and requires somewhat 
longer time to be computed. Note that, by their own definition, the rules in the B* basis do 
not attempt at capturing rules with 100% confidence, but just at complementing them with 
partial rules; hence, the Guigues-Duquenne basis has some additional information. For the 
sake of comparison, this basis is given in Table VI. The considerable redundancy is clear: 
many variants of "support" implies "vector" become reduced to a single one under the 
confidence boost bound. One may ask why the similar case of "vector" implies "support" 
is missing from the list of 26 rules: the answer is that its confidence is slightly under 75% 
and, thus, it is not reported under the 80% threshold. Once more we see that setting the 
thresholds with no formal guidance runs into very risky processes. It would be necessary to 
try and help the user by some sort of self-adjustment of the thresholds. We have attempted 
at one first approach along this line, which is reported next. 
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Table VI. The 44 implications in the Guigues-Duquenne basis at 2% support 



supp % 








2.358 


al 


=> 


et 


2.219 


machine models 


=> 


learning 


2.219 


subjcct:LS support svms vector 


=> 


machines 


2.358 


hidden markov 


=> 


models 


2.080 


bci 


=> 


subject :BC 


2.080 


eeg 


=> 


subject :BC 


2.080 


collections 


=> 


subjcct:IR 


2.219 


document paper 


=> 


subjcct:IR 


2.358 


documents paper 


=> 


subjcct:IR 


2.358 


document new 


=> 


subjcct:IR 


2.497 


document documents 


=> 


subjcct:IR 


2.774 


document information 


=> 


subjcct:IR 


2.080 


data results vector 


=> 


subjcct:LS 


2.497 


data learning problem set 


=> 


subject :LS 


2.080 


object results 


=> 


subject :MV 


2.219 


image images subject:LS 


=> 


subject :MV 


2.358 


image object 


=> 


subject :MV 


2.358 


images recognition 


=> 


subject :MV 


2.358 


object recognition 


=> 


subject :MV 


2.635 


images results 


=> 


subject :MV 


2.774 


images object 


=> 


subject :MV 


2.219 


algorithm generalization 


=> 


subject :TA 


2.358 


bound subject :LS 


=> 


subject :TA 


2.080 


based machines vector 


=> 


support 


2.080 


machines used vector 


=> 


support 


2.080 


paper show vector 


=> 


support 


2.080 


classification machine vector 


=> 


support 


2.080 


learning svm vector 


=> 


support 


2.219 


machines svm vector 


=> 


support 


2.358 


kernel machines vector 


=> 


support 


2.497 


machines method vector 


=> 


support 


2.497 


machines paper vector 


=> 


support 


3.467 


machines using vector 


=> 


support 


2.774 


machines such vector 


=> 


support 


2.358 


machines svms 


=> 


support vector 


2.080 


method problem support 


=> 


vector 


2.080 


new subject:TA support 


=> 


vector 


2.219 


support well 


=> 


vector 


2.497 


machines methods subjcct:LS 


=> 


vector 


2.635 


support svms 


=> 


vector 


2.635 


learning machines subjcct:LS 


=> 


vector 


2.913 


machines subjcct:LS subject:TA 


=> 


vector 


3.606 


kernel support 


^> 


vector 


6.380 


machines support 


^> 


vector 



6. TOWARDS PARAMETER-FREE ASSOCIATION MINING 

In this section we describe an open-source software tool that profits from closure-based 
confidence boost and its properties to offer a sensible association mining process, while 
refraining from asking the user to select any value of any parameter: our system yacaree 
(Yet Another Closure-based Association Rule Experimentation Environment), a proof-of- 
concept currently implemented fully in pure Python. It combines several processes using lazy 
evaluation by means of the functional programming facilities available in current versions 
of Python to mine high-boost B* association rules. Its key property is the self-tuning of the 
support and the confidence boost thresholds. 

As in most current proposals, yacaree mines only frequent closed itemsets; initially, it 
enforces a support bound that starts ridiculously low (namely, at 5 transactions). In most 
applications, one cannot rely on mining all frequent closures at this threshold: this might or 
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might not be possible, depending on the dataset; therefore, along the process, the threshold 
will be automatically increased. Frequent closures are mined via a simplified variant of 
ChARM [Zaki and Hsiao 2005], rather close to a depth-first search but with the proviso 
that closed itemsets are produced in order of decreasing support, so that increasing the 
support threshold does not invalidate the closures found so far. 

This idea is reminiscent of the decreasing support in the version of "apriori" implemented 
in the Weka tool [Witten and Frank 2005] , but in that well-known system the user still has 
to provide a maximum and a minimum values to try the support threshold, and a "delta" 
by which the support keeps decreasing; then, the "apriori" algorithm is run repeatedly for 
the corresponding sequence of support thresholds. Further, the process stops when a given 
number of rules, also chosen by the user, has been found. This makes it unlikely to find rules 
of low support. The "predictive apriori" alternative, present in that tool as well [Scheffer 
2005; Witten and Frank 2005], also attempts at adjusting the support, by balancing it with 
respect to confidence. Our system works very differently, as it is able to mine closures in 
order of decreasing support by its own algorithmics, and self-adjusts the internal effective 
support bound on the basis of technological limitations, in a manner that is autonomous 
and independent of the confidence or of any other parameter of the mining process. 

The closed set miner takes the form of an iterator, and searches for the next closed set 
to be reported only when asked to do so. Each closure found is analyzed, upon yielding it 
to the next phase, to see whether it can be further extended without failing the current 
support threshold, and all those extensions, with their explicit supporting transaction lists, 
are added to a heap which provides instantaneously the largest-support closed set that has 
not been extended so far. 

The closures are passed on to a lattice constructor, a "border" algorithm which computes 
the lattice structure, so that immediate predecessors of each closed set are readily available, 
as it is convenient for computing the basis B* . The lattice constructor itself is based on 
[Baixeries et al. 2009] and works also as an iterator, constructing Hasse edges only when 
they are needed. Rules are, then, constructed from the lattice. Closures and candidate rules 
are either discarded, if we can guarantee that future threshold adjustments will never recover 
them; or processed, if they obey the thresholds; or maintained separately on hold, if they 
fail the current thresholds but might turn to obey them after future adjustments. 

The support threshold changes along the process. It starts, as indicated, at an almost 
trivial level, and grows, if necessary, as the monitorization of the mining process reveals 
that the memory consumption surpasses internal thresholds. More precisely, the heap where 
unexpanded closures are stored is considered in overflow when either its length, or the 
total memory it uses, or the sum of the lengths of the associated support lists, exceeds 
a corresponding predefined threshold. At that point, the minimal support constraint is 
recomputed and raised as necessary so that the exploration can continue. In this way, both 
the risk of entering a huge closure space, and the risk of memory overflow upon computing 
the supports of the closed sets (as sometimes happens for dense datasets) are avoided. 

We impose a very mild confidence threshold that remains fixed, letting large quantities 
of rules pass; but we control the number of rules to be provided to the user via a threshold 
on the closure-based confidence boost, which is adjusted also along the run. We use the 
approximation to the confidence boost provided by the support ratio (Proposition 4.6) to 
push the confidence boost constraint into the mining process, and we use the lift, applied 
to the particular cases to which Corollary 4.18 applies, to self-adjust the boost threshold. 

In fact, as the Hasse edges of the closures lattice arc identified, the support ratio can be 
computed easily If it is lower than the current confidence boost threshold, the closure is not 
adequate to yield high boost rules, but it could become so if, in the future, the confidence 
boost threshold decreases. Therefore, the confidence boost constraint is partially "pushed 
into" the mining process by temporarily omitting the expansion of such closed sets. Instead, 
they are maintained separately into a dedicated data structure, from where they are "fished 
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off" again in case a decrease of the boost bound promotes them to candidate closures for 
creating high-boost rules. We take advantage of the support ratio constraint also to compute 
the confidence boost of rules, as per Algorithm 3: we know that, if the closed set reaches 
that stage, then its support ratio is high enough, so we do not need to test it again. 

The mining process starts with a somewhat demanding confidence boost bound, that 
requires a rule to have at least 15% more confidence than any of the rules participating in 
its confidence boost in order to qualify as interesting. In some datasets, this figure is not 
that restrictive, and dozens of rules still make it. By default, the system writes off as result 
the up to 50 rules of highest boost. 

In many datasets, though, that confidence boost bound is too demanding. The program 
monitors the lift of rules having one single item as antecedent and obtained from a closed 
set that has support ratio above the confidence boost bound (cf. Corollary 4.18). If these 
lift values keep decreasing, they enter a weighted average with the current confidence boost 
bound and may decrease it. In this way, we track the degree of correlation empirically found 
in the dataset to reduce conveniently the confidence boost bound. There is a static limit to 
this boost bound: it is never allowed to drop below 1.05. (All the hardwired limits can be 
modified easily in the same module statics. py of the source code.) 

The result is a functional preliminary system, where ample room still remains for efficiency 
and algorithmic improvements, which shows that it is possible to find interesting association 
rules in a fully autonomous manner: the user simply selects a dataset and launches the 
process, which takes just one to five minutes in many easy datasets, and up to ten to twenty 
minutes on a modern laptop for a few difficult, highly dense datasets. The output is a 
set of rules which, in most cases, is reasonably small and shows independent and sensible 
associations. 

The open source, plus some example datasets, can be downloaded from 
http://sourceforge.net/projects/yacaree/; these example datasets are already 
preprocessed into transactional form, and come from [Asuncion and Newman 2007] or 
[Fortelius 2003], or from the e-prints repository of the Pascal Network of Excellence. The 
screenshot provided in Figure 2 shows the simple interface (button "Run" is disabled as 
the system has been just run) and the two text files generated: the log, where we can see 
that the process took a bit over five minutes, and the start of the file containing the rules 
found. Both the console and the log indicate the self-adjustments of the support; along this 
particular run, no adjustment was performed on the boost threshold, as enough high-boost 
rules were found for its initial value. 

7. DISCUSSION 

The main contribution of this paper is the closure-based confidence boost: a new concept 
that measures a form of objective novelty for association rules, which we have studied from 
the formal and algorithmic perspective and which we have used to construct open source 
association mining tools. 

Our starting point was the study of notions of redundancy in a "logical" spirit. When a 
rule is irredundant, we still can use relative confidences to assess the degree of irredundancy, 
which we see as a potentially useful formalization of objective novelty. 

A redundancy due to larger consequents can be measured by the support ratio; as such, 
both earlier notions like confidence width and our new proposals are related to it. A re- 
dundancy due to smaller antecedents only in some cases is handled appopriately by the 
preexisting confidence width, due to the stringent condition of "logical" redundancy; with 
the also preexisting notion of blocking, the case of smaller antecedents is handled in a less 
strict, more intuitively useful way. A bound on the simplest of the two versions of confi- 
dence boost is exactly equivalent to bounding both preexisting notions, width and blocking; 
therefore, our first new proposal allows for much smoother handling of the combination of 
the previously studied concepts. 
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Fig. 2. A screenshot of yacaree with the rules and log output files 



As the notion of plain confidence boost turns out to be debatable for one specific "closure- 
aware" basis, the B* rules, we have proposed also a more sophisticate "closure-aware" 
version of the confidence boost, for which we have developed the corresponding formal and 
algorithmic study. 

An obvious drawback of using a confidence boost bound is the need to choose yet another 
parameter for the mining process, besides confidence and support. However, in our experi- 
ments, this problem did not seem to be that serious: a noticeable aspect of the confidence 
boost bound is that the outcome of the mining shows relatively quite low sensitivity both 
with respect to its precise value and with respect to the values of other parameters such as 
confidence: quite similar sets of rules are obtained. We quickly learned to use two standard 
values, at 1.05 to prune off just really low novelty rules and at 1.2 to prune more aggres- 
sively; whereas, in case the dataset still gives many rules above this threshold, occassionally 
we would employ the very drastic value of 1.5. This scheme tends to work well, and not only 
that: it also make less critical the choice of the confidence threshold, that can be safely left 
at a somewhat low value (say, around 0.6 to 0.7), leaving to the boost parameter the task 
of reducing the output size. These empirical facts were widespread to such an extent that 
we attempted at using (closure-based) confidence boost to try and construct a parameter- 
free association miner: the yacaree system, able to self-tune the closure-based confidence 
boost and the support thresholds. We believe that the embodiment of the computation of 
the B* basis together with closure-based confidence boost bounds in an open source tool will 
promote its use in data mining practice, as yacaree exhibits a unique quality of "turnkey" 
system that works with just the few clicks needed to choose the input dataset. Of course, 
it can be used as well in the standard manner, as the default initial values of confidence, 
support, and other internal parameters can be manually tuned effortlessly, if necessary, by 
data mining experts. However, this action is not anymore necessary, as yacaree is ready to 
do its best with no need of user choices. The system is platform independent, although in a 
system with small memory, the control of the heap size may require some initial tuning (to 
be made just once) to avoid runtime errors for lack of memory; whereas, in very powerful 
systems, obtaining the most of them may also require some tuning. 
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The shortcomings of confidence thresholds discussed at the beginning of Subsection 2.2 
have been often interpreted as an inadequacy of the very notion of confidence. Yet, we 
prefer to develop our proposal in the context of support and confidence bounds, for several 
reasons. 

First, conditional probability is a concept known to many educated users from a number 
of scientific and engineering disciplines, so that communication between the data mining 
expert and the domain expert is often simplified if our measure is confidence. Second, as 
a very elementary concept, it is the best playground to study other proposals, such as our 
contribution here, which could be then lifted to other similar parameters. 

Third, and more importantly, we believe that, in fact, our approach of complementing it 
with relative measures will make up for many of the objections raised against confidence. 
In fact, our interpretation of this sort of objections is not the widespread consequence that 
"confidence is inapproprate" to filter and rank association rules, but that "an absolute 
threshold on confidence is inappropriate" to filter and rank association rules. This does not 
mean that it has to be replaced as a measure of intensity of implication, and, in fact, it 
has been observed and argued that (at least in somewhat sparse transactional datasets) 
the combination of support and confidence is already very good at discarding rules that 
are present only as statistical artifacts and do not really correspond to correlations in the 
phenomenon at the origin of the dataset [Megiddo and Srikant 1998]; instead, we consider 
that our message is that it should be complemented with relative confidence thresholds that 
assess the novelty of each rule by comparison with the confidence of logically (or intuitively) 
stronger rules. The identification of the precise notion for this task is a clear research issue, 
to which we have contributed via our two variants of the notion of confidence boost. 

A number of connected approaches to association rule quality exist in the literature. We 
discuss here those that we have found most closely related; Subsection 7.2 is devoted to the 
deeper analysis of a particularly close contribution. We finish the paper with a description 
of forthcoming work. 

7.1. Comparisons to Related Work 

We refer to [Geng and Hamilton 2006] for an excellent survey of many options to relate 
supports of left and right hand sides of association rules to construct indicators of interest- 
ingness. Many of these only work on a single rule, with no reference to alternative rules with, 
say, smaller but otherwise arbitrary left-hand sides. A notable case is lift, which implicitly 
refers to a rule with the same right-hand side and an empty left-hand side, as discussed in 
the proof of Proposition 3.4. Compared to this family of measures, confidence boost is finer 
as it can distinguish among many alternative antecedents to compare, at the price of being 
potentially more expensive to evaluate due to the search for smaller but arbitrary left-hand 
sides, and larger but arbitrary right-hand sides. We have shown several algorithms that 
attempt at circumscribing this search to smaller spaces. 

More sophisticated interestingness measures are possible, for instance those based on 
the KL-divergence between probability distributions induced with and without the given 
rule [Jaroszewicz and Simovici 2002]: the induced distributions satisfy the supports of the 
rule and of its antecedent but otherwise maximize the entropy. In preliminary tests, our 
approach, with quite robust settings of confidence (between 0.6 and 0.7) and boost (stardard 
threshold of 1.2) gives results very close to those in [Jaroszewicz and Simovici 2002]. 

Several published works attempt at a similar detection of the "exceptionality" or "surpris- 
ingness" of rules; many of these work in the relational setting, instead of the transactional 
setting where our work fits. Relational data can be analysed in the transactional setting 
by converting a pair given by an attribute name and a value for the attribute into a single 
item, as we do in the Adult dataset in Table II. Assuming the relational structure of the 
data, however, brings in the extra power of "implicit negation" of attributes, due to the 
incompatibility among simultaneous values of the same attribute. This implicit negation is 
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useful to explain novelty by comparing more specific rules stating a consequent of the form 
A = V to more general rules stating a consequent of the form A = V for V ^ V, and 
quite interesting results along this line can be found in [Padmanabhan and Tuzhilin 2000; 
Suzuki 1997; Suzuki and Kodratoff 1998], among others. Our purely transactional setting 
(like for the Retail or Now datasets) does not allow us to employ this method of implicit 
negation and, therefore, such contributions are not directly comparable to ours. 

A few additional contributions that still lie in the transactional setting and are similar to 
ours are discussed next. The notions of confidence width and rule blocking from [Balcazar 
2009] are similar to the "pruning" proposal from [Liu et al. 1999], in that the intuition is the 
same; also our proposal here follows an analogous intuitive path. Major differences arc that, 
in the proposals we discuss, a large portion of the pruning becomes unnecessary because we 
work on minimum-size bases, namely representative rules, and, more importantly, that the 
pruning in [Liu et al. 1999] is based on the x 2 statistic, whereas we will look instead into 
the confidence thresholds that would make the rule "redundant" , either in a "formal logic" 
sense or in a more intuitive, but still logical-style relaxation. Our notions are also similar 
to the notion of improvement, proposed in [Bayardo et al. 1999] and also discussed in [Liu 
et al. 1999; Webb 2007]; but improvement is a measure of an absolute, additive confidence 
increase, with no reference to representative rules or redundancy, and it only allows for 
varying the antecedent into a smaller one, keeping the same consequent. 

7.2. Minimum Antecedent and Maximum Consequent 

Many works suggest further notions of redundancy, in most cases based upon mere intuition. 
The fact that a rule X — > XY is redundant with respect to X — > XY' whenever Y C 
Y' (in the sense of having at least the same confidence) is pointed out in many places 
(e.g. [Aggarwal and Yu 2001; Kryszkiewicz 1998b; Phan-Luong 2001; Shah et al. 1999]). 
Our starting point being the representative basis, we only would keep X — > XY if its 
confidence is higher than that of X — > XY' , by a factor indicated by the confidence boost; 
this quantification is an effective refinement of that known proposal. 

On the other hand, redundancy of X — > XY with respect to X' — > X'Y, where X' C X, is 
debatable. As we have already discussed in Subsection 2.2, rules X — > XY and X' — > X'Y, 
where X' C X, provide different, orthogonal information. Still, one may wish to forget 
about AB — > C if A — > C is already present; this seems a natural attitude, and, in fact, 
explicit proposals of removing the seemingly redundant rule appear in many references, 
often jointly with the (correct) observation of redundancy due to larger consequents. This 
happens in the structural cover of [Toivoncn et al. 1995], and in some of the pruning rules of 
[Shah et al. 1999] (which focuses on a slightly different approach since their main measure is 
actually lift, but, in fact, most of their developments work for confidence as well); and also 
in [Scheffer 2005]. All these proposals may make sense as heuristics, and their connection 
to confidence boost is developed below; however, if taken as redundancy statements then 
they are incorrect and, in some cases, where a precise mathematical statement and its proof 
are provided (like [Scheffer 2005] ) , the proof can be seen to switch into a "full implication" 
meaning of the "arrow" connective, and is actually wrong, therefore, since it does not 
apply to partial rules. Discarding the apparently weaker rule requires more care and a finer 
discussion and, actually, the confidence boost provides for this. 

In fact, without pretending to argue redundancy, one could consider rules with mini- 
mal antecedent and maximal consequent simply as an heuristic for handling a large set of 
mined rules, acting as a sort of summaries of rules with larger antecedents or shorter con- 
sequents, or both. As a representative of these proposals, we chose to discuss the approach 
of [Kryszkiewicz 1998c] which can be casted as follows: 

Definition 7.1. For a fixed confidence threshold 7 and a fixed support threshold t, the 
minimal- antecedent, maximal-consequent rules MMR Tj7 are those rules X — > XY (with 
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X n Y = 0) such that c(X XY) > 7, s(X -> XY) > r, and for which the following 
holds: the only rule X' -> X'Y' with X' n Y' = 0, c(X' -> X'Y') > 7, s(X -> XY) > t 
which satisfies that X' C X and Y C Y' , is itself: X = X' and Y = Y'. 

The following holds [Kryszkiewicz 1998c]: 

Proposition 7.2. For a confidence threshold 7 and a support threshold r, all MMR Ttl 
rules are representative rules for these thresholds. 

Let us point out that these rules are subtly different from the min-max approximate basis 
of [Pasquier et al. 2005], given in Definition 2.3, their apparent similarity notwithstanding. 
There, the closed set forming the whole right-hand side is to be maximal, including the 
antecedent; here, only the part of the closed set that does not belong to the antecedent is 
to be maximal. As the antecedent is itself minimal, the notions differ. In a sense, MMR are 
to min-max rules as confidence boost is to confidence width. 

Example 7.3. In our running example, we find that rule BC — > A has confidence 7 = 8/9. 
It is a representative rule at its confidence threshold 7 = 8/9, hence it is a min-max rule by 
Proposition 2.5; but it is not in MMR T>7 since c(B — > A) = 10/11 > 7. This example also 
proves that the converse of Proposition 7.2 does not hold. 

As discussed in depth in Subsection 2.2, we must be aware that MMR's may lose in- 
formation, since rules that have nonminimal antecedents may be actually irredundant and 
potentially interesting. Our main proposal in this paper, confidence boost, can be interpreted 
as a quantitative variant of MMR's, whereby nonminimal antecedents or nonmaximal con- 
sequents are likely to be considered not novel (and conversely) , yet this connection depends 
on how well the rule clears the confidence and support thresholds. More precisely: 

Proposition 7.4. Fix support and confidence thresholds r and 7. 

(1) IfX^Yisa MMR Tn rule, then f3(X -> Y) > min ( s{x ? Y) , c(X ^ Y) ) ■ 

(2) IfX-tY is not a MMR Tn rule, then f3(X -> Y) < c{X ^ Y) . 
Proof. 

(1) Consider an MMR Ti7 rule X -> Y. Any different rule X' -> Y' with X' C X and 
Y C Y' must fail either the support threshold r or the confidence threshold 7. First 
we show that, for such a rule, c(X' — > Y') < max(j^y,7), considering two cases. 

Assume X' ^ X, and consider rule X' — >• Y, which is also different from X — >• Y. 
We have s(X'Y) > s(XY) > t so that it must fail the confidence threshold; hence, 
c(X' -> Y') < c(X' -> Y) < 7 < max(^,7). Assume now X' = X: cither c(X' -> 

Y') < 7, or X' -> Y' fails the support threshold, s(X'Y') = s(XY') < r, whence 
c(X' -+ Y') = <-r_ 1 = -r n] thus c(X> -+ Y') < max(^, 7 ) again. 

Now we can bound the confidence boost easily: any rule considered for the maxi- 
mization in the denominator of the definition of confidence boost has confidence at 
most max(^^y,7), and there are finitely many of them, so that the denominator itself 

obeys the same bound, which implies that /3(X — > Y) > min ^ c ( x ^ y ) ; ^Jk±Kl^j = 

min (£(^a ) £(^)). 

(2) This part is quite simple. If X — > Y is not an MMR T 7 rule, then there must exist 
some different rule X' -4- Y' with X' C X and Y C Y' passing the support and 
confidence thresholds; this rule enters the maximization in the denominator of the 
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definition of confidence boost, which is, then, at least 7, resulting in a confidence boost 

That is: a rule that is not an MMR T . 7 rule, and barely clears the confidence threshold 7, 
can be appropriately pruned as not novel due to low boost; but, if its confidence is much 
higher than the threshold, even if it is not MMR, it may exhibit enough novelty to make 
it debatable whether it must be pruned off the output. Conversely, an MMR T>7 rule that 
clears barely the support and confidence thresholds may turn out to be of low confidence 
boost, and it could be better to omit it from the output. Essentially, the same purpose is 
attempted by both approaches but confidence boost bounds offers a quantitative evaluation 
of the extent to which representative rules are appropriate as rules to choose for the output of 
the mining process: they will often coincide with the MMR Ij7 but these will be occassionally 
inadequate. 

7.3. Further Work 

Of course, the use of confidence boost does not preclude a combination with lift or any other 
measure of intensity of implication; to what extent these separate measures interact with 
confidence boost, and which ones perform best, is one among many open lines of future 
research. 

Indeed, whatever method is proposed to reduce the output of an association miner leaves 
a major doubt: are these the rules one really wants? We plan to continue working on 
this rather subjective issue, and intend to employ further actual end-user evaluations from 
dataset providers, as we have started to do with respect to partial aspects. We are working on 
datasets coming from an e-learning platform, for which we have a manually recorded labeling 
of the interest of each rule, provided by the dataset suppliers, namely, the teachers of the 
courses where the datasets originated, who are also available for consultation. The particular 
characteristics of this dataset require us first to extend our approach into handling both 
presence and absence of each item [Balcazar et al. 2010a; 2010b]. Also, sometimes, some 
of the full-confidence implications would be desirable indeed for inclusion in the output, 
given that working on the basis B* leaves them fully out; however, it is unclear whether 
confidence boost would still be the right notion, and, even so, full-confidence implications 
require to compute the minimal generators of each closure, therefore losing the desirable 
advantage offered by closure-based confidence boost operating on top of B* rules, which can 
be computed much faster since they only use the closures lattice. We continue to investigate 
this problem, and some partial progress, on which we still hope to improve, is reported in 
[Balcazar et al. 2010b]. 

The yacaree tool has many developments open to further work. First, since we mine 
frequent closures in descending support, instead of ascending, some of the optimizations 
in ChARM require further work before being readily applicable; also, the best algorithm 
in [Baixeries et al. 2009] (namely iPred) to compute Hasse edges is not applied, as it as- 
sumes a cardinality-ordered traversal of the closed sets instead of a support-oriented one; 
the theorems that guarantee its applicability have been obtained only recently, and a forth- 
coming version of yacaree will sport this faster algorithm, iPred. Also, it seems possible 
that a smarter coupling of the miner with the lattice computation might provide further 
accelerations. On the other hand, from the point of view of the user, and beyond efficiency 
improvement considerations, a few alternative internal configurations of the parameters 
might reveal themselves useful, provided one can hit with intuitive descriptions that make 
them clearly understandable by nonexperts: indeed, whereas the user is grateful for being 
able to run the program with no parameter selection, yacaree is not snake oil, and it is 
likely that, for certain datesets, and after seeing the result, the user may be tempted to "try 
again" in some alternative way. 
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Hence, we will work next on improving the speed of the system, on finding sensible ways 
of reporting interesting full-confidence implications without paying too much as a time 
overhead, and on developing interactions with end users to study their evaluations of the 
generated sets of rules, possibly leading thus to further refinements of the confidence boost 
notion and of any other aspect that might be considered. In the meantime, researchers 
interested in conducting their own evaluation can download the system freely and analyze 
the output of confidcnce-boost-boundcd mining on their datasets; this author would be 
grateful to be informed of the results. 
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